Learning from Really Bad Graphs & Ill-informed Conclusions: Thoughts on the New PEPG “Catching Up” Report

A new policy paper from Eric Hanushek, Paul Peterson & Ludger Woessmann has been receiving considerable attention. This despite numerous completely outlandish assertions drawn from junk charts that fill the pages of this reformy manifesto.

Look, I’ve said it before and will say it again. Eric Hanusek has contributed a great deal of high quality research to the fields of education policy and economics of education over the years and I have in the past and continue to this day to rely heavily on much of it to inform my own analyses and thinking in education policy. But this kind of stuff is really just infuriating. Rather than spend too much time venting, let’s try to use this new report for instructive purposes – to instruct the casual reader how to debunk and distill complete and utter BS when presented with pretty scatterplots and glossy formatting.

First, for your reading pleasure, the complete brief may be found here: http://www.hks.harvard.edu/pepg/PDF/Papers/PEPG12-03_CatchingUp.pdf

Before I go down this road, allow me to point out that it’s one thing to offer up this type of analysis as a conversation starter… or even as a provocation with all relevant caveats and disclaimers. It’s yet another to present information of this caliber (or lack thereof) as a serious attempt at immediate influence over policy. There’s a huge freakin’ difference there. And it is certainly my impression that this brief, by its framing, is indeed intended to shape the immediate policy conversation as much if not more so than to generate speculative, intellectual musings over the various possible meanings of the charts.

Further I’m particularly concerned with the way in which much of the information is presented and the way in which conclusions are drawn from that information. This is where this brief can be useful and illustrative – where we can turn this clumsy manifesto into a teaching moment.  I’ll tackle three specific issues here:

  1. measures matter, especially when we are dealing with money and test scores,
  2. the complexity of educational systems is difficult to untangle two-measures at a time,
  3. always watch out for the ol’ bait and switch! (sometimes it’s really obvious!)

The report presents numerous international comparisons (that’s the focus) of similar rigor to the state level comparisons I critique here. I’m just a bit pressed for time, and had the state data more readily available.

Measures Matter!

Okay… so here’s the first graph that drove me up the freakin’ wall. This graph is a classic extension of what I refer to as the Hanushekian cloud of uncertainty.

Figure 1 – State Spending Increases & Test Score Gains (from report)

For decades, Hanushek has been presenting deceptively oversimplified scatter plots of school district, state level and international data on education spending and outcome measures. These scatterplots in and of themselves are invariably freakin’ meaningless.  I evaluate this body of literature by Hanushek as a whole in my policy brief Revisiting the Age Old Question: Does Money Matter in Education?  

This graph provides a new twist, comparing the dollar increases in spending to the NAEP average annual gain. Hanushek uses this graph to draw the following conclusions:

 According to another popular theory, additional spending on education will yield gains in test scores. To see whether expenditure theory can account for the interstate variation, we plotted test-score gains against increments in spending between 1990 and 2009. As can be seen from the scattering of states into all parts of Figure 9, the data offer precious little support for the theory.

On average, an additional $1,000 in per-pupil spending is associated with a trivial annual gain in achievement of one-tenth of 1 percent of a standard deviation.

Michigan, Indiana, Idaho, North Carolina, Colorado, and Florida made the most achievement gains for every incremental dollar spent over the past two decades.

(keep an eye on Michigan and Indiana – we’ll hear from them again later. Here, they are AWESOME – getting bang for the buck… Of course, one can look good on this indicator by simply not spending much more and showing commensurately paltry outcome gains!)

I love the sarcastic use of “precious” in this quote. But I digress.

But there are at least a few small – okay… pretty damn big … okay … huge… completely undermining – problems with using this scatterplot to draw these conclusions.

Let’s set aside the outcome measure for now and focus on two other not-so-trivial issues. First and foremost, a $1,000 increase in spending in Louisiana and a $1,0000 increase in spending in New Jersey or Connecticut may… just may… not be worth the same. Does $1,000 more go as far to improving competitiveness of teacher salaries in New Jersey as it does in New Mexico? Uh… not so much.  In fact, the National Center for Education Statistics Education Comparable Wage Index indicates that competitive wages in New Jersey are substantially greater than in Louisiana, significantly altering the value of the additional dollar.  Second… it’s possible that other factors may actually play a role too?

Let’s shatter the spending measure & related conclusions first! Here’s an alternate view – taking the current expenditures per pupil from 2008-09 over the current expenditures for 1990-91 – that is, expressing them effectively as a percent increase over base year (albeit not inflation adjusted – see this post for more on this topic).

Figure 2

Hmmm… as it turns out, New Jersey spending really didn’t increase much as a percent over the base year. Louisiana, however, did. In fact, Louisiana actually had among the highest growth among states.  Well then, that would mean that New Jersey really kicked some butt! Not much spending increase at all… and some pretty damn good outcome gain!

The bottom line however, is that either scatterplot is pretty meaningless, with mine arguably slightly less meaningless than the original! But neither really useful for making any bold statements about state aggregate spending and outcome gains. Again, in my policy brief on Money Matters, I explore these issues in much greater detail. Referring to more rigorous studies attempting to link spending and outcome measures, I explain:

They [more recent studies] also, however, raised new, important issues about the complexities of attempting to identify a direct link between money and student outcomes. These difficulties include equating the value of the dollar across widely varied geographic and economic contexts, as well as in accurately separating the role of expenditures from that of students’ family backgrounds, which also play some role in determining local funding.


I can’t pass up this seemingly tangential point.  I took particular enjoyment in this finding from Hanushek’s new report:

Maryland, Massachusetts, and New Jersey enjoyed substantial gains in student performance after committing substantial new fiscal resources.

Hanushek went to great lengths in an earlier book and in related policy papers to make the case the New Jersey was a classic example of failed massive spending increases and he has repeatedly cited New Jersey’s failures (as recently as this spring – my rebuttal here!) as a reason why other states should not increase funding for schools. Kevin Welner and I discuss this Hanushekian claim extensive in a recent article in Teachers College Record.

Isn’t that precious?

Two Measures Generally Insufficient for anything but Playful Speculation & Exploration!

As I noted above, the second reason why we should NOT take the Hanushekian cloud seriously, nor should we take the other graphs in the new report too seriously is that they attempt to draw inappropriately bold conclusions from graphs involving only two variables at a time. This approach can be useful for exploring patterns and/or raising questions. We all should spend much time exploring visual representations of our data- getting to know our data – our measures and how they relate. But to take this information and assert that spending matters little, or to go even further and make claims that the South is rising again… and that accountability driven policies of southern states are leading to disproportionate gains while curmudgeonly anti-reformy anti-accountability Midwest states are suffering, is just absurd.  I’ll dig into these conclusions a bit more in the next and final section.

What else might be going on here? Well, one likely issue requiring at least some more exploration is whether there are any substantive changes in the demography of these states. Yeah… it’s just possible that states that saw greater improvement saw less increase in poverty. Uh… and yeah… it’s possible that states that started lower gained more. Now, the authors acknowledge this latter point, but then brush it off. Instead, they assert that a likely alternative explanation is that Midwest states were riding high on their past successes and great universities, and simply got complacent.

Here are a few figures to chew on.

Figure 3 – Demographics and Outcome Change

Note that Hanushek, Peterson and Woessmann make a big deal about the great performance of Louisiana, Delaware, Maryland and Florida and the particularly sucky performance of Michigan, Indiana, Minnesota and Wisconsin. Uh… wait, weren’t Indiana and Michigan awesome above – for getting those paltry outcome gains for little or no additional investment? Yeah… but now they suck. Really… suck… because… they’re complacent… and not reformy.   As it turns out, the states referred to as generally awesome by the authors also had generally less increase in % low income students.

Figure 4 – Starting Performance Level and Outcome Change

While the authors acknowledge that starting performance levels are associated with outcome change, they go to great lengths to blow off this issue, arguing a) that it explains a relatively small share of the variation (uh… only about a quarter of it… which is actually quite large for this type of data/analysis) and b) that other plausible explanations involving the southern reformyness vs. midwestern complacency dichotomy may explain much of the rest of the difference? (without any evidence to support this notion!).

Yes. Starting level does seem to matter! And that can’t be overlooked, or brushed aside.

Together, change in % free lunch and 1992 8th grade math score explain about 41% of the variation in annual gain across the 34 states for whom each measure is available.

Ye Ol’ Bait & Switch

But there are bigger and more obvious problems with the conclusions drawn in this report… that don’t really even require much statistical digging. A classic deceptive strategy used in this type of reporting is ye ol’ bait and switch and/or conflating one group identification with another.

Ye ol’ bait and switch is often used in voucher debates where pundits will point to elite private schools as examples of the choices that all children/families should have and will then point to the average tuition of Catholic elementary schools (circa 1999) as an example of the cost of private education (see: http://nepc.colorado.edu/publication/private-schooling-US). Uh… 1999 national average Catholic elementary school tuition won’t cover much of the tuition at Sidwell Friends in 2012!

An entire subsection of the Hanushek, Peterson and Woessmann report is titled Is the South Rising Again? Much attention is paid in the report to the premise that southern states are staging an impressive comeback and that this impressive comeback is a function of their forward thinking in the 1990s and 2000s.

Specifically, the authors laud the achievement gains of Louisiana, Delaware, Maryland and Florida! All, of course, “southern.”

And specifically, the authors laud the early reformyness of Tennessee, North Carolina, Florida, Texas, and Arkansas – as providing possible explanations for the high performance of southern states!

Wait a second…. Those aren’t the same freakin’ states are they? What’s up with that? Did they really do that? Did they really frame it that way?

Here’s what the report says:

Five of the top-10 states were in the South, while no southern states were among the 18 with the slowest growth. The strong showing of the South may be related to energetic political efforts to enhance school quality in that region. During the 1990s, governors of several southern states—Tennessee, North Carolina, Florida, Texas, and Arkansas—provided much of the national leadership for the school accountability effort, as there was a widespread sentiment in the wake of the civil rights movement that steps had to be taken to equalize educational opportunity across racial groups. The results of our study suggest those efforts were at least partially successful.

Meanwhile, students in Wisconsin, Michigan, Minnesota, and Indiana were among those making the smallest average gains between 1992 and 2011. Once again, the larger political climate may have affected the progress on the ground. Unlike in the South, the reform movement has made little headway within midwestern states, at least until very recently. Many of the midwestern states had proud education histories symbolized by internationally acclaimed land-grant universities, which have become the pride of East Lansing, Michigan; Madison, Wisconsin; St. Paul, Minnesota; and Lafayette, Indiana. Satisfaction with past accomplishments may have dampened interest in the school reform agenda sweeping through southern, border, and some western states.

Keep in mind that Louisiana and Delaware didn’t get all reformy until the Race to the Top Era. Further as shown above, Louisiana actually had one of the largest proportionate increases in funding and Louisiana had relatively low growth in low income students.

Here’s a look at the BAIT and at the SWITCH, where I consider the bait to be those precious high outliers – the over-performers in the analysis, and the switch to be the states that were lauded as implementing policies that are likely behind this performance. As it turns out, while those early accountability/reform states also saw pretty good gains, their gains are more or less in line with gains of other states that had similar starting point – at least on 8th grade math (my apologies for simply not having the time to combine all NAEP scores, but the 8th grade math starting point explains 27% of the variation in gain, and along with free lunch change explains 41% of the variation in gain. Not bad, and more than Hanushek, Peterson and Woessmann suggest!).

Figure 5 – The BAIT… and the SWITCH!

Why is this relevant? The assertion being made in this report is essentially that the SWITCH group of states were implementing desired policies… policies that the sucky states like Michigan and Indiana should perhaps consider – or at least should have instead of resting on their laurels. Then, perhaps they could have looked more like the  precious bait. The problem is that the only overlap between the BAIT and the SWITCH is Florida – hardly a stereotypical “southern” state… and one whose reformyess and NAEP gains have been discussed & critiqued extensively by others in recent years (not time for that here). And then of course, we have the proclamation of the suckyness of Michigan and Indiana. Okay… which is it?

The bottom line in all of this is that this new report doesn’t tell us much. I don’t really have a problem with that. What I have a problem with is assuming that it does.

I do have a problem with particularly junky charts/analysis like the one asserting that spending increases have no relationship to outcome increases – with no consideration at all for the regional differences in the value of those increases – and all of the other variables that may… just may… play some role! That’s just lazy and sloppy and inexcusable.

But, at least I’ve got a new handout for discussion & critique for the first week of my fall semester class on data analysis and reporting!



  1. Outstanding analysis. Wouldn’t any analysis of whether increases in state spending have/have not led to improved NAEP scores have to look at whether there were increases in ELLs and students with disabilities as well? Because those are the most expensive students to educate

  2. It is good to see that Florida is making some improvements; they have a long way to go! You have pointed out many variables that were not taken into account within the various States but what about the comparison of the US with other Nations? Is that going to be your second week course topic?

    1. Not sure I’ll have a chance to get to the follow up, but suffice it to say that the international comparisons come with layers of additional complexities related to context.

  3. Maine’s Governor and Commissioner of Education have used the Harvard study to label the state’s schools as “dismal”. Thanks for providing some much-needed perspective.

    1. Thanks. I saw the comments comparing Maine and New Hampshire. It’s an interesting contrast. It would appear that the two started in similar places and that NH gained more from 1990 to 2011. We are working with limited information here though. Maine has higher overall poverty and the two fall in line in average NAEP level and average poverty rate in 2010-11 – both high. Current expenditures per pupil in the two states are comparable in 2009 and in 1991 – But, I suspect Maine has a much larger share of kids in smaller, more remote districts [Most NH kids being concentrated in the Southern tier]. Maine is higher than expected on NAEP in the early 1990s and falls back into line over time. That’s interesting, but obvious explanations are beyond me at this point – and I’m reasonably familiar with the two states (having grown up in VT, taught in public schools in Merrimack, NH in the early 1990s and spending Summer vacations up in Maine, incl. this year).

      What certainly cannot be asserted is that New Hampshire’s relative success compared to Maine is somehow because of some centralized teacher policies – incl. test-based evaluation – or because of rapid expansion of independent charter schooling. New Hampshire has certainly NOT led the way in classic modern reformyness of the RTTT era. Nor has NH been a leader in state imposed test-based accountability generally – especially during the heart of the period in question. During this period, NH legislators spent much of their time avoiding developing outcome standards so they could avoid being held to the Claremont ruling on school funding (where the court mandated that they develop standards first, then figure out how to fund districts to meet those standards).

      So, it would be utterly foolish to use the NH/Maine contrast as a basis for arguing that Maine needs to expand charter schools, used test-based teacher evaluation or increase/adopt high stakes testing. Though I’ve not see that as the argument, feel free to fill me in as to where this is all headed. I may be exploring this contrast further! Interesting!

  4. Dr. Baker,
    Wondeful lesson for those who should care. It reminds me of the time our district saw all subgroups pass the NJASK in higher percentages, yet there was a overall decrease in proficiency rates. Most of our “data driven” imbeciles thought there moist be an error: “How could each subgroup see an increase and we have an overall decrease? Impossible, there must be an error in the numbers”.
    I did a good job of keeping my composure while explaining the faulty logic. The population percentages of subgroups had changed enough to mask what was essentially ” good news/progress”.
    Thanks for sharing such a logical analysis.

  5. The study is being used by the Governor and the state’s largest newspaper to bash Iowa’s public education system as well amid calls for drastic calls for changes in teacher evaluation systems emphasizing test scores. Any thoughts?

Comments are closed.