Converging Evidence on Vocabulary Acquisition: Another Look at Reynolds (2016)

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

In a previous post, I reviewed a recent study by Reynolds (2016) on the impact of frequency and “congnativeness” in vocabulary acquisition. It is worth discussing his data a bit further in order to compare his findings with Paul Nation’s (2014) estimates on the time efficiency of vocabulary acquisition through reading.

Nation analyzed a series of adult-level English novels and concluded that if (a) an ESL reader reads texts in which 2% of the words are unknown (that is, he knows the meaning of 98% of the words on the page), and (b) words are acquired after 12 encounters in a text, the reader can acquire (on average) around seven words per hour (the math is worked out in McQuillan (2016)).

Reynolds had a group of ESL readers read a short novel, BFG by Roald Dahl, and then tested them on a set of made-up words (“giant words”) that were invented by Dahl for the book. Reynolds’ subjects were clearly very advanced L2 readers: he reported that they knew on average around 9,000 word families in English, which Nation has shown is what one needs to handle adult-level texts written for native speakers of English.

How did the performance of Reynolds’ subjects compare to the estimates made by Nation in terms of the number of words acquired per hour?

We’ll begin by assuming that Dahl’s “giant words” were the only unknown words in the text for Reynolds’ ESL readers. There was a total of 299 giant word types:

  • Of these 299 types, there were 43 types that occurred three times or more, which, calculating from Reynolds’ frequency counts, give us 367 tokens.
  • Reynolds also selected six twice-occurring word types, which add 12 more tokens, for a running total of 379 tokens.
  • That leaves leaves 250 word types that appeared either twice or only once.

My analysis (McQuillan, 2016) of a longer novel, Twilight, found that about 60% of the word families that are unlikely to be known by the reader who knows 98% of the words in the text occur only once. If that pattern were true of the giant word types, it would work out to be 179 word types of the 299 total.

However, to provide the most conservative estimate possible, I’ll assume that all 250 of these word types are single-occurring, deliberately underestimating the total number of words acquired (and the total number of tokens that were giant words).

The ESL readers acquired 10 of the 49 words Reynolds tested them on, but he only did complete assessments on the words that occurred three or more times, plus the sample six twice-occurring words. Nevertheless, we can get get a (again rough) estimate the single-occurring word pickup rate by using a formula proposed by Nagy, Herman, and Anderson (1985) for calculating probabilities of acquisition for a single-exposure:

Pn = 1 – (1 – P1)n

where “Pn is the probability of learning from context on the basis n exposures, P1 is the probability of learning a word on the basis of one exposure” (p. 248). Nagy et al. suggested that using the probabilities from lower-frequency occurrences to calculate the single-occurrence probability may be more accurate than using probabilities calculated from more frequently occurring words, since using the latter may not “satisfactorily compensate for diminishing returns from later exposures” (p. 248).

I computed the single-occurrence probability using Reynolds’ data for words that occurred 2, 3, and 4 times:

2 x  = .083
3 x  =  .087
4 x = .052

The average estimated acquisition probability for single-occurring words is thus around .07 (averaging these three probabilities).

This falls in between Pellicer-Sánchez and Schmitt’s (2010) results for word recall for single-occurring words with their adult L2 readers in a similar study design (.05) and Nagy, Herman, and Anderson’s (1985) estimate for the pickup rate for words encountered by young first-language readers (.11).

We multiply 250 by .07 to get another 17.5 additional words acquired, for a total of 27.5 giant words. Again, since we do not know the actual number of words that occurred twice, and Reynolds tested only a sample of them, 27.5 is almost certainly an undercount. If there were, say, 50 twice-occurring word types, the reported 16% pickup rate for twice-occurring words would have resulted in eight words acquired, rather than the figure we’re using here, which is less than one word (6 words * .16 = 0.96 words acquired).

Reynolds reported that students were given five hours of class time to read the text, but did not provide more exact data. Given that these were very advanced ESL readers, a 200 wpm reading rate or more seems likely. This means his subjects would have finished the novel in 185 minutes (37,000/200), or just over three hours.

We divide the total time they spent reading by the number of words they acquired (27.5), and we find that readers acquired on average a new word every 6.7 minutes. We can convert this to words per hour (60 minutes/6.7), yielding 8.9 words per hour – higher than Nation’s estimate.

If we use a much more conservative reading rate of 150 wpm, that would give us a total reading time of 250 minutes, or just over four hours. We divide that by the 27.5 words acquired, and we get 9.1 minutes per word, or 6.5 words per hour, which is very near Nation’s estimate.

Reynolds’ data from actual students acquiring words in a realistic setting thus parallel nicely Nation’s calculations based solely on a corpus analysis. It provides confirming evidence that incidental acquisition in a real-world setting can typically contribute somewhere between six and nine words acquired per hour of reading.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone