This Week In Language Education (May 12, 2017)

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

Can Talking to Your iPad Improve Your German?

Schenker and Kraemer (2017) (Open Access) compared the speaking proficiency on the SOPI of two groups of college students studying second-semester German (N=52). Students in one group were given iPads for 13 weeks, and assigned three short (average: 75 seconds) speaking assignments each week which they had to record on their iPad. Assignments talking on topics such as “Describe your room at college” and “Tell us about your favorite restaurant.”

At the end of the semester, the iPad group had a higher SOPI score (Cohen’s d = 0.68) than the non-iPad group, although neither the fluency nor accuracy subscales of the exam showed any differences between groups.

There are a few things wrong here. First, the iPad students also had to watch at least four other student-created videos each week, meaning they got somewhat more language input than the non-iPad group. Second, the iPad group almost certainly got more German input than the controls by virtue of having to research and prepare their 36 speaking assignments. The control group’s online assignments consisted of textbook “vocabulary and grammar” activities (p. 4).

More input outside of the classroom is just as likely an explanation for the group differences as any added output, which in any case was minimal: students spoke to their iPads an average of only 41 minutes during the entire semester.

Schenker, T., & Kraemer, A. (2017). Maximizing L2 Speaking Practice through iPads. Languages, 2(2), 6.

An Early Start in English as a Foreign Language Doesn’t Help

Jeaekel and his colleagues (in press, paywall) followed two large cohorts of English as a Foreign Language (EFL) students in German schools for a period of seven years. One “Early Starter” group (N=3,340) got EFL instruction beginning at age 6-7 (Year 1), while the “Later Starter” group (N=2,632) didn’t begin receiving instruction until two years later (Year 3). Both groups were tested in English again at Year 5 and Year 7.

The Early Starter group got a total of 549 hours of instruction by the time they reached Year 7, compared to only 444 hours for the Later Starters. In Year 5, the Early Starters were clearly better in English than the Late Starters, but by Year 7, the situation had reversed: Late Starters did significantly better than Early Starters, although the effect was smaller for listening (d = .17) than reading (d = .35) (Table 2, p. 16). The researchers concluded that starting English early was actually worse for students.

This is not a new finding (see Krashen, Long, and Scarcella, 1979). Older children (and adults) acquire languages faster than younger children, contrary to all of the misinformation found in the popular media.

The researchers also looked at the effect of sex, parental income, nonverbal IQ, and the number of books in the home on the students’ English scores. Interestingly, controlling for other factors, books in the home was a positive predictor of English scores.

This make sense: children who read more in their L1 have more background knowledge to help make the L2 instruction more comprehensible, and more developed L1 literacy skills to transfer to the new language – in other words, they get the benefits of bilingual education.

Jaekel, N., Schurig, M., Florian, M., & Ritter, M. From Early Starters to Late Finishers? A Longitudinal Study of Early Foreign Language Learning in School. Language Learning.

Another Massive Vocabulary Study Finds No Gains, Massive or Otherwise

Jayanthi and colleagues (in press; paywall) conducted a study – in what seems like an endless series of massive, federally-funded studies of this sort – to determine the efficacy of vocabulary instruction. The “pilot” study (Gerstner et al., 2010) (PDF) was a randomized controlled trial (RCT) involving 81 teachers and 468 first graders, bigger than most full-scale studies of any sort in education. That project included 20 hours of teacher training over the course of the school year on the intervention, which consisted of “rich” vocabulary instruction and comprehension strategies. Students were tested at the end of both first and second grade.

No statistically significant differences were found on oral vocabulary, reading vocabulary, or “passage” comprehension (actually measured by a cloze test, and thus mostly a word recognition test). No differences were found on an oral reading test, either.

Undaunted, the research group doubled down and launched the present study, another RCT focussing this time just on vocabulary instruction. This project involved 212 teachers across 61 schools, with a student sample of 1,680 first-graders. The teachers received 12.5 hours of training over the school year. Similar student assessments were given as in the pilot study, minus the Woodcock-Johnson “passage” comprehension test.

The results: “[T]here were no significant impacts (sic) on any of the individual measures. Effect sizes were all close to zero” (p. 21-22, in press manuscript version). In fact, on two of the vocabulary tests, the effect sizes were slightly negative, but nonsignificant.

To be fair, vocabulary studies of this sort typically measure results on “curriculum-based” tests; that is, kids are tested on the words taught, not on a general vocabulary test. Still, the results are not encouraging, but consistent with the mostly dismal results this sort of instruction has on vocabulary acquisition and reading comprehension. (Note: The poor showing of vocabulary instruction hasn’t stopped the research team from trying it once again – third time’s a charm?)

The Impact of Teacher Study Groups in Vocabulary on Teaching Practice, Teacher Knowledge, and Student Vocabulary Knowledge: A Large-Scale Replication Study. Journal Of Research On Educational Effectiveness (in press).

Reading Tests That Don’t Actually Test Reading

Hua and Keenan (2017, paywall), following up on early work by Keenan and her colleagues, examined five popular reading comprehension tests to see how much of the variance in scores could be explained by a reader’s listening comprehension, and how much by his ability to read individual words in isolation (word recognition).

Hua and Keenan tested a large group of students (N=834) ages 8 to 18. They used everyone’s current favorite for statistical analysis, quantile regression, to see how readers of different abilities fared. I report here the results for the “average” student:

  1. Woodcock-Johnson Passage Comprehension (WJPC):
    Word Recognition: 56%
    Listening Comprehension: 33%
  2. PIAT:
    Word Recognition: 61%
    Listening Comprehension: 25%
  3. Gray Oral Reading Test:
    Word Recognition: 23%
    Listening Comprehension: 41%
  4. Qualitative Reading Inventory-Questions:
    Word Recognition: 17%
    Listening Comprehension: 52%
  5. Qualitative Reading Inventory-Retells:
    Word Recognition: 17%
    Listening Comprehension: 52%

There’s a clear difference between tests #1 and #2 and #3-5: the PIAT and Woodcock-Johson Passage Comprehension (WJPC) are mostly word recognition tests rather than measures of comprehension.

Why the difference? In the PIAT and WJPC, readers read single sentences and/or complete a “cloze” (fill in the missing word). There is little context to aid the reader, nor is there a very high level of complex comprehension required. Tests #3, 4, and 5, however, involve reading longer passages and answering questions or retelling the events of the story.

In other words, Tests 3-5 measure more of what most people (teachers, the general public) consider “reading comprehension” – understanding what you read. Word recognition, on the other hand, is often used as a proxy for decoding skills (the ability to convert letters into sounds), and not paradoxically is also much more easily influenced by phonics instruction.

Why is this important for us to know? At least three reasons:

  • The authors of The National Reading Panel’s Report  (among others) have claimed that phonics instruction improves “reading comprehension” for Kindergartners and first-graders. But, as Elaine Garan rightly pointed out, the NRP conclusions relied on studies that used tests like WJPC, so the evidence actually only indicates that phonics helps word recognition. You can improve word recognition without improving reading comprehension, and indeed both experimental and observational longitudinal studies of the effects of early phonics instruction show no improvements to actual reading comprehension at all in later grades (Sonnenschein et al., 2010Torgesen et al., 1999; Torgesen et al, 2011).
  • Schools often rely on WJPC and similar tests in “diagnosing” dyslexia. Children with perfectly normal reading comprehension may thus be wrongly mislabeled as dyslexic.
  • Researchers examining the supposed genetic and neurological basis of dyslexia and reading disabilities also use tests like #1 and #2, and thus may be getting results (again) that have nothing to do with reading comprehension. In fact, Keenan and her colleagues showed precisely that (Keenen at al. 2006): depending on which reading test you used, you could identify a completely different set of genes as being related reading.

One more note about the study: Hua and Keenan report that for PIAT and WJPC, the percentage of variance explained by word recognition is about the same for both good and poor readers. For the “real” tests of reading comprehension, however, there is more variation across skill levels. At some skills levels, the amount of variance explained by word recognition changes significantly, dropping at one point to 0% (QRI-Questions, 90th percentile; Table 4, p. 10).

The researchers speculate that their results for the QRI-Questions measure is probably due to ceiling effects on the QRI-Questions for good readers – the readers all clustered at the top, so there was little variance among them. That makes sense, until we consider that listening comprehension for this same group of readers accounted for 41% of the variance. One would think that the lack of variance in QRI-Question scores would have affected both relationships, not just word recognition. Correction: Hua explained (personal communication, June 17, 2017) that amount of variance explained for both word recognition (WR) and listening comprehension (LC) measures declined from the 50th to the 90th percentile about equally, showing that the ceiling effect likely did effect both variables.

This doesn’t alter the larger point we’re making here, which is that listening comprehension is much more important than word recognition in predicting (real) reading comprehension scores.

Hua, A. N., & Keenan, J. M. (2017). Interpreting Reading Comprehension Test Results: Quantile Regression Shows that Explanatory Factors Can Vary with Performance Level. Scientific Studies of Reading, 21(3), 225-238.

Quick Takes

Follow me:

Twitter: http://www.twitter.com/backseatling
Facebook: http://www.facebook.com/backseatlinguist
Medium: https://medium.com/@backseatling (new!)

 

 

 

 

Print Friendly, PDF & Email
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone