Noticed this trend in August 2021. Finally wrote some thoughts down in October 2022.
Alphabet has this dataset called Ngram that estimates the incidence of words across all of the texts available in Google Books.
Something I noticed last year is that the use of the word “I” has increased rapidly over the last 40 years. “I”’s share of words as nearly doubled since 1900, and is up almost 3X since 1980.
What is going on here? Is this evidence that we are becoming more individualistic, or is there something else going on?
Here are some ideas as to what’s happening:
-
The data are bad
-
I’m using the data wrong
-
Different kinds of books
-
Book are becoming more individualistic on average — the interesting one!
It’s probably a combination of a couple of these. I wanted to explore each a bit more.
Are the data bad?
Case for no: There are peer-reviewed journal articles that used Google Ngram as a dataset, such as:
-
The changing psychology of culture in German-speaking countries: A Google Ngram study
-
Fashionable Functions: A Google Ngram View of Trends in Functional Differentiation (1800-2000)
-
Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
-
And a paper co-authored with the team that built the tool: Quantitative Analysis of Culture Using Millions of Digitized Books
(But I found some of these via Google Scholar so maybe that’s a conflict of interest)
Case for yes: Per the Wikipedia page, there might be issues with OCR, changes in the types of books/texts sampled, changes in the size of the sample used for each year, etc.
Am I using the data wrong?
One quick check: I_PRON gives the same trend. More about the different tag options can be found here.
Is it different kinds of books?
Comparing incidence of the word “I” to some other words can help sort that out. There might be an increasing share of first-person narratives being published.
Looking at some other words that might be disproportionately common in, say, first-person YA fiction than Steinbeck; it does look like those increase as well:
Since 1980, “I” is up 3x. “you” is up ~3.5x, “he” over 2x, “she” is up 5x (!!) and “they” up 2x. So that makes a pretty strong case for different types of books being written.
Books seem to be more individualistic
This is the query that made me want to dig in further:
Use of the word “we” is up ~50% since 1980, compared to, again, 3x or 200% for “I”
That’s a much bigger difference versus some of those other words. So even if changes in the type of books explains part of the trend, it still feels like there’s something to be said for the books being written becoming more individualistic.
So what?
Not sure. I just noticed and wanted to explore a bit. I’d be curious to read a book or something. Like “Bowling Alone” but even broader?