Bias in book lists: A data-driven look back on 2016
Author and entrepreneur Ryan Holiday is a voracious reader, and shares his book recommendations in a thoughtful and entertaining monthly newsletter. Last month, he published his roundup of The (Very) Best Books I Read In 2016. All 18 books on the list were written by men.
Some followers took note, and shared their concern – some in less than civil terms. “How is this even possible? So disappointed. It’s fucking 2016,” one reader wrote on Facebook.
Holiday returned the flack. You can read his denunciation of the “online diversity police” in the Observer. In Holiday’s view, excessive focus on gender and race misses the point. When it comes to book authors, we should seek out diversity of life experience and perspective. The “social justice warrior,” he writes, “confuses diversity of authorship with diversity of thought and message.”
But Holiday presents a false choice. Many women and people of color wrote great books in 2016. Holiday’s list of 18 contemporary books could easily have reflected a great diversity of thought, message, race, ethnicity, and gender.
So, does a list-maker have a responsibility to correct historical injustices by promoting books by women and people of color? Perhaps. But at a minimum, it’s imperative to reflect on how bias might affect our choices. We should expect diverse book lists, not because of some misplaced tribal piety to the diversity gods, but because – given the large number of high quality books being published by women and people of color – it’s the numerically expected outcome.
So what is a baseline against which list-makers of the future could compare themselves? And in what areas can we do better?
A List of Lists
I set out to answer these questions by bookmarking every “best of 2016” list I could find – including book awards, major newspapers, magazines, and famous individuals (including Ryan Holiday, of course). For each book mentioned on these lists, I noted the author’s sex, racial/ethnic identity, and age. In ambiguous cases, I took the liberty of making an educated guess. (Please give me a shout if you find any errors.) In selecting book lists, I tried to include publications spanning political viewpoints; I included the left-leaning New York Times and Vox, the staunchly conservative National Review, and plenty in between.
I used the 2010 U.S. Census race and ethnicity definitions to keep my labeling as consistent as possible. Under the census definitions, race and “Hispanic Origin” are considered separate attributes, making it possible to identify as both Hispanic/Latino and black, for instance. The census methodology doesn’t distinguish Middle Eastern or North African white people from European white people. Nor does it distinguish Indians from other Asians. These methodological minutiae point to a larger question: what categories are distinguishing enough to make us different, anyway? Clearly, the U.S. Census definitions don’t capture all variations of human racial and ethnic identity. But it’s a place to start.
Of the books that made it onto these lists:
- 43% were by women, compared to 51% of the total U.S. population. In the non-fiction category, only 35% of authors were women.
- 22% were by people of color (either Hispanic/Latino or a race other than white), compared to 36% of the total U.S. population.
- 9% were by women of color, compared to roughly 19% of the total U.S. population.
- 2% were by Hispanics/Latinos, compared to 17% of the total U.S. population.
- 46% were by white men, compared to 31% of the total U.S. population.
The average age of people of color who published in 2016 was 45, compared to 50 for non-Hispanic white authors. This is probably because younger, post-Civil Rights era authors have better access to education, and a publishing industry more open to accept their contributions.
Interestingly, the three books mentioned on the most lists were all written by people of color. Colson Whitehead’s The Underground Railroad appeared on 13 lists. Paul Kalanithi’s When Breath Becomes Air and Zadie Smith’s Swing Time each appeared on eight lists. While these authors obviously earned their spots on the list, it’s also unlikely that the repetition of the same books occurred by chance. One might wonder why there isn’t a greater diversity of books listed by authors of color. One possibility is that some list-makers attempted to achieve racial diversity on their lists without reaching beyond an acknowledged set of literary standouts – call it tokenism, if you will.
Here is how the individual lists compared, broken down by race and sex:
Individual book lists are typically so short that any bias is statistically insignificant. That is, unless a list is blatantly lopsided. Such is the case with Ryan Holiday’s list. Considering that women wrote 44% of the best books in our analysis, the probability of assembling an all male-authored list of 18 books (which Holiday did) by chance would be less than one in 250,000.
But in a best effort to give Holiday the benefit of the doubt, let’s consider two more factors. First, Holiday’s list is heavy on non-fiction (a category dominated by male authors in the dataset). Accounting for that mix brings the probability of an all-male list closer to one in 15,000.
Second, four of the books on Holiday’s list were originally published last century or earlier, back when book writing was even more dominated by men. To be more than fair to Holiday, dropping those older books from the analysis brings our probability closer to one in 1,000 – still clearly statistically significant. In his Observer article, Holiday decries the “ridiculousness of using very small samples sizes to make very large assumptions.” But there is nothing ridiculous, mathematically speaking, about drawing conclusions here: Holiday’s list strongly indicates a bias against female authors.
My aim is not to suggest Holiday’s intentions were anything other than to create a list of great books, but to point out that unconscious bias isn’t a figment in the imaginations of so-called social justice warriors. It is a mathematical fact, and one that we need to address.
Note: I have open-sourced the data used in this analysis. I hope others dig in and draw their own conclusions.