The education writer at Forbes, Natalie Wexler, argued last week that reading fiction isn’t the “only” thing needed to boost academic achievement. It’s a curious position to take, not because it is wrong – I agree with her completely – but because no one in the reading field I’m aware of has ever said that fiction alone was all you needed to succeed at school.
Wexler seems to misunderstand the real case for reading fiction and its connection to academic achievement.
Krashen (2010) called the rationale for fiction reading the “Bridge Hypothesis.” The Bridge Hypothesis states that light, popular fiction provides a gradual path or “bridge” to academic texts. It does this by helping young readers build both the general and academic vocabulary they need to understand more difficult texts, as well as broadening their knowledge of the world.
Light reading, then, is not sufficient, but it is very useful intermediate step. (Of course, fiction reading is a worthy pursuit regardless of its other effects.)
It is true that nonfiction books have a higher percentage of academic vocabulary than fiction. But fiction still has its share.
Gardner’s 2004 study (discussed below) found that in fiction written for children, 0.7% of the words were academic terms that regularly appear in school texts. Over the course of 1,000,000 words of reading – doable in about a year at 30 minutes a day – even that small percentage adds up, as we’ll see.
Rogers and Rolls (2017) present more evidence of the presence of academic vocabulary in fiction. They found that reading a million words of science fiction would expose you to 294 of the 318 high frequency science terms commonly found in the “hard” sciences (physics, chemistry, etc.). More importantly, more than 40% of those words appear 10 or more times, giving the reader a decent chance of acquiring them.
Gardner (2004): Wrong Questions, Wrong Data, Wrong Conclusions
Part of Wexler’s case against fiction relied on the above-mentioned study by Gardner (2004). But Gardner’s study asked the wrong questions, presented the wrong data, and came to the wrong conclusion.
Gardner compiled text samples from two “registers” or kinds of texts. The first was expository, taken from books used for instruction in a middle school classroom. The second sample consisted of children’s and young adult fiction.
He then assembled lists of “general high frequency” and “academic high frequency” word types.* These lists contained 8,510 word types for general vocabulary and 3,672 word types for academic vocabulary.
Next, he looked to see how many of these general and academic word types appeared in each register exclusively, and how many of them were shared. Gardner concluded that there were too few shared academic words between the fiction and expository registers to make reading fiction a good route to academic success.
But Gardner’s data don’t match Gardner’s conclusion. He reports (Table 7, p. 91) that more than half (56.4%) of the 1,800+ academic word types that appear at least once in his corpus are shared between registers, and that a whopping 79.8% of general vocabulary is shared. Both of these are impressive “overlaps” between the narrative and expository texts. They give fiction readers a significant leg up in understanding what they read in school.
If we add these shared academic word types (1,066) to those that appear only in the fiction texts in Gardner’s corpus (260), we get a total of 1,326 academic word types. That’s exposure to more than a third of Gardner’s entire list of the academic words! Not bad for reading a lot of enjoyable fiction.
Gardner’s glass-is-half-empty view seems in part attributable to his rather strange assumptions about the way vocabulary acquisition should work. As Krashen (2010) notes:
[The lack of overlap] is only a problem if we insist that light reading provide readers with everything they need in order to understand every word of academic texts that they will encounter that year! In other words, Gardner assumes that for narrative reading to be useful, it needs to provide readers with full acquisition of every word in the academic texts they are about to read. (p. 31)
The more sensible position is that fiction reading will increase the amount of students’ academic vocabulary over time, thereby making school texts more comprehensible. Gardner’s data are fully consistent with that position.
While it is interesting to know the number of word families the two registers share, the more important question is: How many academic words are repeated sufficiently (10 or more times) in light fiction to allow readers a good chance of acquiring them? Only then can we know the potential impact reading fiction can have on comprehending school texts.
Gardner doesn’t even ask this question. Instead, he tells us that of the 1,066 word types that are shared between fiction and nonfiction, only 7% (78) are repeated 10 times or more in both registers.
This is the wrong question. We’re not concerned about how often academic word types repeat in the expository texts, but how many are acquirable from fiction. Remember the whole point of this exercise is to see how useful fiction is when reading expository texts. A word type acquired from 10+ repetitions in the fiction corpus will help the reader understand that term in school texts regardless of whether it appears there 10 times, 100 times, or just once.
It’s also the wrong data. The fiction text sample used by Gardner had a million words (tokens), more than twice as large as his expository sample of 400,000 words. Obviously the difference in corpus size means that word type repetitions in the expository text were much less likely, which in turn restricted the total number of word types that could be repeated 10 times in both registers. (Did I mention that this article appeared in one of the top peer-reviewed journals in the field (Applied Linguistics), and has been cited nearly 200 times?)
Other Problems with Gardner’s Study
Gardner’s study has other serious flaws, most of which have already been discussed by Krashen (2010).
For starters, his corpus doesn’t represent the pleasure reading kids actually do in middle school. Most of the books for his fiction corpus were chosen not because they are books kids read for pleasure, but because they fit into one of the three topics (mummies, mysteries, and “Westward Expansion”) that appear in the thematic expository texts chosen by the researcher. They may or may not have been books kids would pick up on their own.
As Ujiie and Krashen (2006) found, kids prefer to read series books like Harry Potter and A Series of Unfortunate Events, a result confirmed by a recent large-scale, representative survey of U.S. children (Scholastic, 2015).
Series books by the same author tend to have more repetition of word types than would a wide selection of fiction from multiple authors. Gardner’s fiction corpus had 28 books written by 19 different authors on three different themes. This fails to reflect the “narrow reading” (Krashen, 2004) that most readers engage in, especially in this age group.
A later study by Gardner (2008) found that there were in fact far more word types judged to be “acquirable” (repeating six or more times) in fiction by the same author than by multiple authors (Table 2, p. 102).
Fixing Gardner’s Flaws
I recently published a study (McQuillan, 2019) that corrects some of the major flaws in Gardner’s analysis. I compiled a million word corpus of series books popular with young readers, including The Babysitter’s Club, Goosebumps, and Series of Unfortunate Events.
For my analysis, I employed a more widely-used “academic word list” (AWL) of 570 word families created by Coxhead (2000). The AWL includes vocabulary that appears across a large number of academic disciplines, sometimes called “sub-technical” vocabulary. The list has also been used in many recent attempts to teach academic vocabulary directly (spoiler alert: explicit instruction in these words doesn’t work.)
I found that 85% of the AWL – 485 of the 570 word families – appeared at least once in the million-word fiction collection. Of these academic words, 213 appeared 12 times or more, making them very acquirable.
In other words, we can predict that in the span of about a year, with 30 minutes per day of reading, children are likely to acquire 37% of the core academic vocabulary needed for success in school – all by simply reading a million words of popular series books.
Note that academic word families are “only” 1.02% of the total number of tokens in my corpus, yet still can have a powerful impact on academic vocabulary development.
None of this means that we should not also want kids to read nonfiction texts for pleasure. I for one was a heavy nonfiction reader in middle school. Like Krashen (2012), I think we should encourage the free reading of nonfiction as well, especially when it centers on the children’s own interests. This is almost certainly how most readers came to acquire a good chunk of their own academic and technical vocabulary, as Krashen (2012) points out.
In addition to its own joys, reading fiction serves as a useful bridge or conduit to academic language. Both Gardner’s (imperfect) study, Rolls and Rogers (2017), and my data indicate that it can do just that.
*The terminology used in these corpus analyses can get confusing. A token refers to each individual word in a text, what you get when you do Word Count on your Microsoft Word program. The sentence “A rose is a rose is a rose” contains eight tokens. But that same sentence has only three word types (a, is, rose). There are three repetitions of rose and a and two repetitions of is. A word family refers to words that have similar or related meanings (e.g. medical, medicine, medicinal, medicate). There are typically more tokens than types, and more types than families, in a text.
Gardner uses word types for his analysis, while most other corpus studies (including my own) have used the larger unit of analysis, word families.
Follow Us!