Is Synthetic Phonics Instruction Working in England? A Closer Look at the “Teaching to Teach” Literacy Report

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

Study reviewed:

Machin, S., McNally, S., & Viarengo. (April, 2016). “Teaching to Teach” Literacy. Center for Economic Performance Discussion Paper No 1425. (PDF)

Since the publication of the so-called “Rose Report“* more than a decade ago, English schools have been required to use “synthetic phonics” to teach children to read (synthetic phonics is more commonly known in the United States as “blended phonics”). Last year, three economists (Machin, McNally, & Viarengo, 2016) analyzed data from a large sample of English schoolchildren to assess the effectiveness of a synthetic phonics pilot program as well as the implementation of the new phonics mandate. They concluded that synthetic phonics was indeed effective compared to other methods of teaching reading in “closing the gaps” between students who “start out with disadvantages . . . compared to others” (p. 3).

I argue that the researchers’ results do not support this claim. Both experimental studies and Machin et al.’s analysis show that phonics instruction has a medium effect on initial literacy levels, but little to no impact on reading achievement in later grades.

“Teaching to Teach” Literacy

Machin and colleagues examined the test scores of two different cohorts, each of which contained a group of students who were taught reading with synthetic phonics and a group who was not.** The first cohort included students who were part of a phonics pilot study (Early Reading Development Pilot (ERDp)) prior to the release of the Rose Report in 2006.

The second cohort (named “CLDD” for “Communication, Language, and Literacy Development Programme”) was formed by the first wave of Local Education Authorities (the British equivalent of our school districts) that adopted the nationwide phonics curriculum announced in 2007. These early-adopter CLDD schools were compared to those that adopted the curriculum two years later (p. 11).

Students were assessed at three different points:

  • “Foundation Stage” (at age 5, after one year of instruction),
  • “Key State 1” (at age 7), and
  • “Key Stage 2” (at age 11).

The Foundation Stage Profile (now called the “Early Years Foundation Stage Profile”) was done by the child’s teacher, and consisted of a dizzying number of “scales” (13 in the 2008 version!) that students were rated on. Four of these scales related to “communication, language, and literacy,” one of which was “linking sounds and letters.” The Key Stage 1 assessment was also done by the child’s teacher, while Key Stage 2 was scored externally.

Machin et al. reported the effect size comparisons for the phonics groups and the non-phonics groups. Effect sizes allow us to see the magnitude of the impact an intervention has on test scores in a “standardized” unit, which is normally reported as the number (or fraction thereof) of standard deviations that separate the two groups. This is especially important in studies such as this one, where the very large sample sizes (approaching 500,000) can cause even small score differences to be “statistically significant.” Effect sizes help us determine whether those differences are meaningful in practice.

Opinions differ on how to interpret effect sizes, but the generally accepted rule-of-thumb in education research is that proposed by Cohen (1988): effect sizes of less than .20 are “small,” from .20 to 50 are “medium,” and more than .50 are “large.” The U.S. Department of Education’s “What Works Clearinghouse” Handbook (NCES, 2014) recommended that effect sizes should be at least .25 to be considered “substantively important” for education research (p. 23).

I summarize Machin et al.’s effect size data in Table 1, broken down by student characteristics, including those who have English as an Additional Language (EAL) (equivalent to the “language minority” students in U.S. schools), children who were eligible for “free meals,” and both EAL and “free meals” students (all taken from their Table 6, p. 32). (The researchers use “non-native speakers of English” but it appears that they are referring to data on EALs, a broader designation that includes English language learners and native bilinguals (p. 16).)

Table 1: Effect Size Differences for Phonics and Non-Phonics Students in Machin et al. (2016)

  Cohort 1  Cohort 2 
Age 5
(Foundation Stage)
Age 7
(Key Stage 1)
Age 11
(Key Stage 2)
Age 5
(Foundation Stage)
Age 7
(Key Stage 1)
Age 11
(Key Stage 2)
Native Speakers.225.052-.045.211.061.001
EALs.567.134.045.201.113.068
Non-Free School Meals.306.042-.061.182.104.042
Free School Meals.290.135.064.207.136.062
EALs & Free School Meals.300.216.181.221.195.099

There are two conclusions we can draw from these data, both consistent with previous, more rigorously-designed studies.

1. Phonics Teaching Has a Moderate Impact Initially

Phonics training has a moderate impact during the first year of instruction. Effect sizes in favor of phonics were mostly in the medium range (.20 to .50) at Age 5, but a few, such as for non-native speaking students, were large (.567, Cohort 1). Effect sizes were somewhat higher for the pilot study schools (Cohort 1) than for the first wave of schools that implemented the new phonics methods after the Rose Report (Cohort 2).

We have no way of knowing how instruction affected comprehension scores versus those parts of the assessments that are more easily influenced by phonics instruction, since they are reported as a single measure. Experimental research has found that phonics instruction boosts scores on phonics tests, but has a much smaller impact on measures of reading comprehension. A meta-analysis of 38 phonics studies by Ehri, Nunes, Stahl, and Willows (2001), for example, reported that the effect size for phonics training was .67 for decoding regular words, .60 for decoding pseudo-words, but only .27 for tests of “comprehending text.” A meta-analysis by McArthur et al. (2011) found that phonics instruction had an effect size of .76 for pseudo-word reading. .47 for real-word reading, but only .14 for reading comprehension (which was not significantly different from zero). Torgesen et al. (2006) similarly reported a larger effect size for “accuracy” scores (real words + pseudowords) (.39) than comprehension scores (.24, and non-significant).***

2. The Effect of Phonics Instruction Fades Quickly

By age 7, when most students are probably reading independently, the difference between the children taught synthetic phonics and the controls declines sharply for all groups, and is less than .20 for all comparisons except non-native speakers + free meals group (and for Cohort 1 only: .216). By Age 11, only one of the 10 comparisons shown in Table 1 is even greater than .10, and all are under .20.****

Again, experimental studies confirm these results.  The impact of phonics instruction, even on tests of phonological knowledge, tends to decline soon after the intervention is over. Suggate (2014) conducted a meta-analysis of 16 experimental studies on the long-term impact of phonics instruction. I summarize his findings in Table 2.  The “Post-Test” effect size is the difference immediately after the treatment, and the “Delayed Post-Test” effect size is for the follow-up assessment. Note that on average the delayed post-tests were given less than a year after the intervention ended (mean = 11.17 months, p. 8). “Pre-reading” included “sub-skills” such as phonemic/phonological awareness, letter naming, and decoding. “Reading skills” included word identification, oral reading fluency, and reading accuracy scores.

Table 2: The Impact of Phonics Instruction on Immediate and Delayed Post-Test Measures of Literacy in Suggate (2014)

AssessmentPost-TestDelayed Post-Test
Overall.29.07
Reading Skills.26.07
Pre-Reading.32.08
Comprehension.47-.10

It’s clear that, when measured in controlled experiments, the impact of phonics instruction all but disappears a year or so after it is introduced, even for those tests that measure phonological awareness and decoding. The impact on reading comprehension was actually negative on the delayed post-tests, but all the effect sizes were tiny. A more recent, large-scale evaluation (N = 4,500) of the Open Court reading curriculum, which includes intensive phonics instruction, similarly found no positive effects for phonics on reading scores after one year (Vaden-Kiernan et al., in press).

Nor is experimental data on phonics instruction more favorable if we extend the follow-up to beyond one year. Blachman, Schatschneider, Fletcher, Murray, Munger, and Vaughn (2014) (not included in Suggate’s meta-analysis) compared the reading scores of a group of students who received intensive phonics training in early elementary school to a control group more than a decade after the intervention. There were no significant differences on any of the reading comprehension measures used, even ones biased toward decoding skills, such as the Woodcock-Johnson. The study’s authors did find significant differences for a few of the decoding measures, but for all other measures, including spelling, the effect sizes were “small to negligible” (p. 53).

The results of the “Teaching to Teach” Literacy study do not support the assertion that synthetic phonics is having a positive impact on the reading scores of primary schoolchildren in England. The evidence Machin et al. present is consistent with experimental studies that have found intensive phonics instruction makes a modest initial impact, but has very small effects on reading achievement later on.


*The Rose Report itself was, according to Machin et al., heavily influenced by an earlier study done in Clackmannanshire, Scotland, that found favorable effects for phonics (Johnston & Watson, 2005). See Krashen (2009) for a review of that study.

**No information was given in the report on the instruction teachers used with the comparison groups. They may in fact have also used phonics in their teaching or some sort of “mixed” approach.

***I suspect this effect size for reading comprehension is overstated, since several of the phonics studies in the Torgesen et al. and McArthur et al. analyses used “cloze” assessments (such as the Woodcock-Johnson passage comprehension test) that are largely just measures of decoding ability (Keenan, Betjemann, & Olson, 2008).

****A separate analysis of the impact of a “phonics screening check” implemented in English schools in 2012 (Walker, Sainsbury, Worth, Bamforth, & Betts, 2015) found that while the number of children scoring higher on the phonics test had gone up substantially (a 16% gain in two years), scores on the Key Stage 1 barely budged, with an effect size of .08 that the researchers correctly described as “not very big” (p. 27).

 

 

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone