Is Synthetic Phonics Instruction Working in England? (Updated)

Since originally posting this analysis back in September, 2017, I have shortened it for publication in Margaret M. Clark’s new edited volume, Teaching Initial Literacy: Policies, Evidence, and Ideology (2018). But here I’m posting a somewhat longer version than the one included in Clark’s book, with updates to my original post.

Is Synthetic Phonics Working in England? A Comment on the “Teaching to Teach” Literacy Report

Jeff McQuillan


Since the publication of the “Rose Report” (Rose, 2006) more than a decade ago, schools in England have been required to use “synthetic phonics” (Wyse & Goswani, 2008) to teach children to read. Three economists (Machin, McNally, & Viarengo, 2016) analyzed data from a large sample of English schoolchildren to assess the effectiveness of a phonics pilot program and early implementation of the new phonics mandate. They concluded that synthetic phonics was indeed effective compared to other methods of teaching reading in “closing the gaps” between students who “start out with disadvantages . . . compared to others” (p. 3).

I argue that the researchers’ results do not support this claim. Both experimental studies and Machin et al.’s analysis show that phonics instruction has a modest effect on initial literacy levels, but little to no impact on reading achievement in later grades.

Analysis: “Teaching to Teach” Literacy

Machin and colleagues examined the test scores of two different cohorts, each of which contained a group of students who were taught reading with synthetic phonics and a group who was not. Machin et al. gave no information on the instruction teachers used with the comparison groups, which could have also included phonics or some “mixed” approach.

The first cohort included students who were part of a phonics pilot study (Early Reading Development Pilot (ERDp)) prior to the release of the Rose Report in 2006. The second cohort (named “CLDD” for “Communication, Language, and Literacy Development Programme”) was formed by the first wave of Local Education Authorities that adopted the nationwide phonics curriculum announced in 2007. These early-adopter CLDD schools were compared to those that adopted the curriculum two years later (p. 11).

Students were assessed at three different points:

  • “Foundation Stage” (at age 5, after one year of instruction),
  • “Key Stage 1” (at age 7), and
  • “Key Stage 2” (at age 11).

The Foundation Stage Profile (now called the “Early Years Foundation Stage Profile”) was done by the child’s teacher, with 13 scales on which students were rated (Qualifications and Curriculum Authority, 2008). Four of these scales related to “communication, language, and literacy,” one of which was “linking sounds and letters.” The Key Stage 1 assessment was also done by the child’s teacher, while Key Stage 2 was scored externally.

Effect Size Comparisons

Machin et al. reported the effect size comparisons for the phonics groups and the non-phonics groups. Effect sizes allow us to see the magnitude of the impact an intervention has on test scores in a “standardized” unit, which is normally reported as the number (or fraction thereof) of standard deviations that separate the two groups (Hunter & Schmidt, 2004). This is especially important in studies such as this one, where the very large sample sizes (approaching 500,000) can cause even small score differences to be statistically significant. Effect sizes help us determine whether those differences are meaningful in practice.

Opinions differ on how to interpret effect sizes, but one widely used rule of thumb is that proposed by Cohen (1988): an effect size of .20 is “small,” .50 is “medium,” and .80 is “large.” In a review of more than 300 meta-analyses, Lipsey and Wilson (1993) found that these designations roughly corresponded to the distribution of effect sizes across a broad range of behavioral research. The U.S. Department of Education’s What Works Clearinghouse Handbook (NCES, 2014) recommended that effect sizes should be at least .25 to be considered “substantively important” for education research (p. 23).

We can also determine the practical importance of an effect size by considering what Plonksy and Oswald (2014) referred to as its “literal” or “mathematical” interpretation. If an intervention has an effect size of .10, a student in the treatment group who began at the 50th percentile would move to the 54th percentile, a difference of only four percentile points (assuming a normal distribution).1 But an effect size of .80 would move that same student to the 79th percentile, a much more substantial gain of 29 percentile points.

Hill, Bloom, Black, and Lipsey (2008) proposed additional benchmarks for effect size interpretation, including comparing the effect to “normative expectations for change” and to effect sizes for similar interventions. They reported that on norm-referenced, standardized reading tests, the annual mean gain in effect size is 1.52 from Kindergarten to Grade 1, 0.97 from Grade 1 to Grade 2, and 0.60 from Grade 2 to Grade 3 (Table 1, p. 173). Lipsey et al. (2012) analyzed the effect sizes from 124 studies of a variety of educational interventions and found that the average effect size was .28 for elementary school, .33 for middle school, and .23 for high school (Table 9, p. 34).

Using Hill et al.’s standards, then, an effect size of .10 should be considered small compared to year-to-year gains at this age, and would be below average compared to other teaching innovations.

Synthetic Phonics Cohort Groups

I summarize Machin et al.’s effect size data in Table 1, broken down by student characteristics, including those who have English as an Additional Language (EAL), children who were eligible for “free meals,” and both EAL and “free meals” students (all taken from their Table 6, p. 32).2 A positive effect size indicates that the phonics students scored higher than the non-phonics students.

Table 1

Effect Size Differences for Phonics and Non-Phonics Students in Machin et al. (2016)

[table "28" not found /]

There are two conclusions we can draw from these data, both consistent with previous, more rigorously designed studies.

  1. Phonics Teaching Has a Moderate Impact Initially

Phonics training has a moderate impact during the first year of instruction. Effect sizes in favor of phonics were mostly in the small-to-medium range (.20 to .50) at Age 5, although in Cohort 1, the effect size for EAL students was larger (.567). Effect sizes were also somewhat higher for the pilot study schools (Cohort 1) than for the first wave of schools that implemented the new phonics methods after the Rose Report (Cohort 2).

We have no way of knowing how instruction affected children’s comprehension scores versus their performance on decoding tasks, since these are combined into a single measure. The distinction is important: experimental research has found that phonics instruction boosts scores on isolated “skills” tests, but has a much smaller impact on measures of reading comprehension (Krashen, 2009).

In Table 2, I report the effect sizes for different measures of reading from three research reviews of phonics instruction, summarizing the results of 38 comparison studies from Ehri, Nunes, Stahl, and Willows (2001), 12 studies from Torgerson, Brooks, and Hall (2006), and 11 studies from a more recent analysis, a Cochrane Systematic Review by McArthur et al. (2011). Note that some studies were used in more than one meta-analysis.

Table 2

Impact of Phonics Instruction on Literacy Assessments in Three Meta-Analyses

[table "32" not found /]

*Combined real and pseudo-word decoding; random effects model estimate used due to study heterogeneity

Phonics instruction has a medium-to-large impact on reading isolated words and pseudo-word words in all three analyses, ranging from .38 to .76. But on reading comprehension tests, the effects are much smaller: .27 in Ehri et al., .24 in Torgerson et al., and .14 in McArthur et al.’s review. The estimates from Torgerson et al. and McArthur et al. were not significantly different from zero.

Any adequate analysis of the effectiveness of synthetic phonics, then, must examine the impact of instruction on comprehension tests as opposed to decoding measures.3 Machin et al.’s analysis does not.

  1. The Effect of Phonics Instruction Fades Quickly

As can be seen in Table 1, by age 7, when most students are probably reading independently, the difference between the children taught synthetic phonics and the controls declines sharply for all groups, and is less than .20 for all comparisons except EAL + free meals group (and for Cohort 1 only: .216). By Age 11, only one of the 10 comparisons shown in Table 1 is even greater than .10, and all are under .20.4

Again, experimental studies confirm these results. The impact of phonics instruction, even on tests of phonological knowledge, tends to decline soon after the intervention is over. Suggate (2016) conducted a meta-analysis of 16 experimental studies on the long-term impact of phonics instruction. I summarize his findings in Table 3. The “Post-Test” effect size is the difference immediately after phonics instruction, and the “Delayed Post-Test” effect size is for the follow-up assessment. On average the delayed post-tests were given less than a year after the intervention ended (mean = 11.17 months). “Pre-reading” included “sub-skills” such as phonemic/phonological awareness, letter naming, and decoding. “Reading skills” included word identification, oral reading fluency, and reading accuracy scores.

Table 3

The Impact of Phonics Instruction on Immediate and Delayed Post-Test Measures of Literacy in Suggate (2016)

 [table "29" not found /]

It’s clear that, when measured in controlled experiments, the impact of phonics instruction all but disappears a year or so after it is introduced, even for those tests that measure phonological awareness and decoding. The impact on reading comprehension was actually negative on the delayed post-tests, but all the effect sizes were small.

A more recent, large-scale evaluation (N = 4,500) of the Open Court reading curriculum, which includes intensive phonics instruction, similarly found no positive effects for phonics on reading scores after the first year of instruction (Vaden-Kiernan et al., 2018).

Nor is experimental data on phonics instruction more favorable if we extend the follow-up to beyond one year. Blachman, Schatschneider, Fletcher, Murray, Munger, and Vaughn (2014) (not included in Suggate’s meta-analysis) compared the reading scores of a group of students who received intensive phonics training in early elementary school to a control group more than a decade after the intervention. There were no significant differences on any of the reading comprehension measures used, even ones biased toward decoding skills, such as the Woodcock-Johnson. The study’s authors did find significant differences for a few of the decoding measures, but for all other measures, including spelling, the effect sizes were “small to negligible” (p. 53).

Conclusion: Policy Should Be Based on Best Evidence

Machin et al.’s analysis on synthetic phonics was based on a “natural experiment,” allowing them to use a very large dataset with two separate cohorts. Natural experiments can provide confirmation of other findings, and are especially useful if you want to study a phenomenon where actual experimental design is not possible.

But that is not the case for teaching methods, about which we can in fact conduct true experiments. We already have several well-designed experiments on the effects of phonics instruction. Policy decisions should be made on the strongest evidence we have, not the weakest (Garan, 2001; 2004).

In any case, the results of the “Teaching to Teach” Literacy study do not support the assertion that synthetic phonics is having a positive impact on the reading scores of primary schoolchildren in England. The evidence Machin et al. presented is consistent with experimental studies that have found intensive phonics instruction makes a modest initial impact, but has very small effects on reading achievement later on.


1 An effect size of .10 that was cumulative over time might be considered substantial, however (Coe, 2002).

2 The researchers use the term “non-native speakers of English,” but it appears that they are referring to data on EALs, a broader designation that includes English language learners and native bilinguals (p. 16).

3 Even the small effect sizes for reading comprehension found in all three meta-analyses are likely overstated, since several of the phonics studies included used “comprehension” assessments that are strongly influenced by decoding ability (see Keenan, Betjemann, & Olson, 2008; and Spooner, Baddeley, & Gathercole, 2004). Of the seven (combined) studies used by McArthur et al. (2011) and Torgerson et al. (2006) to estimate their reading comprehension effect sizes, only two used reading tests that are not strongly affected by decoding skills. Both of those effects were small and non-significant (Ford, 2009), reported in McArthur et al (2011): -.15, for the Gates-MacGinitie test; Lovett et al. (1989), reported in Torgersen et al (2006): .08, for the Gilmore Oral Reading Test (incorrectly listed as the “Gray” Oral Reading Test (p. 61)).

4 A separate analysis of the impact of the Phonics Screening Check (PSC) implemented in English schools in 2012 (Walker, Sainsbury, Worth, Bamforth, & Betts, 2015) found that while the number of children scoring higher on the PSC had gone up (a 16% gain in two years), scores on the Key Stage 1 barely budged, with an effect size of .08 that the researchers correctly described as “not very big” (p. 27). This result is consistent with the general trend seen in experimental studies, that phonics has small, initial impact on phonics tests, but little impact on reading scores later on.


Blachman, B., Schatschneider, C., Fletcher, J., Murray, M., Munger, K., & Vaughn, M. (2014). Intensive reading remediation in grade 2 or 3: Are there effects a decade later? Journal of Educational Psychology, 106(1), 46-57.

Coe, R. (2002). It’s the effect size, stupid: What effect size is and why it is important. Paper presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, 12-14 September 2002. Retrieved from:

Cohen, J. (1988). Statistical Power Analysis for the Social Sciences. Thousand Oaks, CA: Sage Publications.

Ehri, L., Nunes, S., Stahl, S., & Willows, D. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of Educational Research, 71(3), 393-447.

Garan, Elaine. (2001). Beyond the smoke and mirrors: A critique of the National Reading Panel report on phonics. Phi Delta Kappan 82(7), 500-506.

Garan, Elaine. (2004). In Defense of Our Children. Portsmouth: Heinemann.

Hill, C., Bloom, H., Black, A., & Lipsey, M. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172-177.

Hunter, J. & Schmidt, F. (2004). Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. 2nd edition. Thousand Oaks, CA: Sage Publications.

Keenan, J., Betjemann, R., & Olson, R. (2008). Reading comprehension tests vary in the skills they assess: Differential dependence on decoding and oral comprehension. Scientific Studies of Reading12(3), 281-300.

Krashen, S. (2009). Does intensive decoding instruction contribute to reading comprehension? Knowledge Quest, 37(4), 72-74.

Lipsey, M., & Wilson, D. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181-1209.

Lipsey, M., Puzio, K., Yun, C., Hebert, M., Steinka-Fry, K., Cole, M., Roberts, M., Anthony, K., & Busick, M. (2012). Translating the Statistical Representation of the Effects of Educational Interventions into More Readily Interpretable Forms. Washington, D.C. National Center for Special Education Research.

Machin, S., McNally, S., & Viarengo, M. (2016). “Teaching to Teach” Literacy. Center for Economic Performance Discussion Paper No 1425.

McArthur, G., Eve, P., Jones, K., Banales, E, Kohnen, S., Anadakumar, T., & Castles, A. (2011). Phonics training for English-speaking poor readers. Cochrane Database of Systematic Reviews, 12.

Plonsky, L., & Oswald, F. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning64(4), 878-912.

Qualifications and Curriculum Authority. (2008). Early Years Foundation Stage Profile Handbook. London: Author.

Rose, J. (2006). Independent Review of the Teaching of Early Reading. Final Report. Nottingham, England: Department for Education and Skills (DfES) Publications.

Spooner, A., Baddeley, A., & Gathercole, S. (2004). Can reading accuracy and comprehension be separated in the Neale Analysis of Reading Ability?. British Journal of Educational Psychology74(2), 187-204.

Suggate, S. (2016). A meta-analysis of the long-term effects of phonemic awareness, phonics, fluency, and reading comprehension interventions. Journal of Learning Disabilities, 49(1), 77-96.

Torgerson, C., Brooks, G., & Hall, J. (2006). A Systematic Review of the Research Literature on the Use of Phonics in the Teaching of Reading and Spelling. Nottingham, England: Department for Education and Skills (DfES) Publications.

Vaden-Kiernan, M., Borman, G., Caverly, S., Bell, N., Sullivan, K., Ruiz de Castilla, V., Fleming, G., Rodriquez, D., Henry, C., Long, T., & Hughes Jones, D. (2018). Findings From a Multiyear Scale-Up Effectiveness Trial of Open Court Reading. Journal of Research on Educational Effectiveness, 11(1), 109-132..

Walker, M., Sainsbury, M., Worth, J., Bamforth, H., & Betts, H. (2015). Phonics Screening Check Evaluation: Final Report. London: Department for Education.

What Works Clearinghouse. (2014). Procedures and Standards Handbook (version 3.0). Washington, DC: Institute of Education Sciences, U.S. Department of Education.

Wyse, D., & Goswani, U. (2008). Synthetic phonics and the teaching of reading. British Educational Research Journal, 34, 691-710.



Print Friendly, PDF & Email