An Update on New Jersey’s SGPs: Year 2 – Still not valid!

I have spent much time criticizing New Jersey’s Student Growth Percentile measures over the past few years, both conceptually and statistically. So why stop now.

We have been told over and over again by the Commissioner and his minions that New Jersey’s SGPs take fully into account student backgrounds by accounting for each student’s initial score and comparing students against others with similar starting point.  I have explained over and over again that just because individual student’s growth percentiles are estimated relative to others with similar starting points by no means validates that classroom median growth percentiles or school median growth percentiles are by any stretch of the imagination a non-biased measure of teacher or school quality.

The assumption is conceptually wrong and it is statistically false! New Jersey’s growth percentile measures are NOT a valid indicator of school or teacher quality [or even school or teacher effect on student test score change from time 1 to time 2], plain and simple. Adding a second year of data to the mix reinforces my previous conclusions.

Now that we have a second year of publicly available school aggregate growth percentile measures, we can ask a few very simple questions. Specifically, we can ask how stable, or how well correlated those school level SGPs are from one year to the next, across all the same schools?

I’ve explained previously, however, that stability of these measures over time may actually reflect more bad than good. It may simply be that the SGPs stay relatively stable from one year of the next because they are picking up factors such as the persistent influence of child poverty, effects of being clustered with higher or lower performing classmates/schoolmates, or that the underlying test scales simply allow for either higher or lower performing students to achieve greater gains.

That is, SPGs might be stable merely because of stable bias! If that is indeed the case, it would be particularly foolish to base significant policy determinations on these measures.

Let’s clarify this using the research terms “reliability” and “validity.”

  • Validity means that a measure measures what is intended to, which in this case, is that the measure is intended to capture the influence of schools and teachers on changes in student test scores  over time. That is, the measure is not simply capturing something else. Validity is presumed good, but only to the extent those choosing what to measure are making good choices.  One might, for example, choose to, and fully accomplish measurement of something totally useless (one can debate the value of measuring differences over time in reading and math scores as representative more broadly of teacher or school quality).
  • Reliability means that a measure is consistent over time, presumed to mean that it is consistently capturing something over time. Too many casual readers of research and users of these terms assume reliability is inherently good. That a reliable measure is always a good measure. That is not the case if the measure is reliable simply because it is consistently measuring the wrong thing. A measure can quite easily be reliably invalid.

So, let’s ask ourselves a few really simple empirical questions using last year’s and this year’s SGP data, and a few other easily accessible measures like average proficiency rates and school rates of children qualified for free lunch (low income).

  • How stable are NJ’s school level SGPs from year 1 to year 2?
  • If they are stable, or reasonably correlated, might it be because they are correlated to other stuff?
    • Average prior performance levels?
    • School level student population characteristics?

If we were seeking a non-biased and stable measure of school or teacher effectiveness, we would expect to find a high correlation from one year to the next on the SGPs, coupled with low correlations between those SGPs and other measures like prior average performance or low income concentrations.

By contrast, if we find relatively high year over year correlation for our SGPS but also find that the SGPS on average over the years are correlated with other stuff (average performance levels and low income concentrations), then it becomes far more likely that the stability we are seeing is “bad” stability (false signal or bias) rather than “good” stability (true signal of teacher or school quality).

That is, we are consistently mis-classifying schools (and by extension their teachers) as good or bad, simply because of the children they serve!

Well then, here’s the correlation matrix (scatterplots below):


The bottom line is that New Jersey’s language arts SGPs are:

  • Nearly as strongly (when averaged over two years) correlated with concentrations of low income children as they are with themselves over time!
  • As strongly (when averaged over two years) correlated with prior average performance than they are with themselves over time!

Patterns are similar for math.  Year over year correlations for math (.61) are somewhat stronger than correlations between math SGPs and performance levels (.45 to .53) or low income concentration (-.38). But, correlations with performance levels and low income concentrations remain unacceptably high – signalling substantial bias.

The alternative explanation is to buy into the party line that what we are really seeing here is the distribution of teaching talent across New Jersey schools. Lower poverty schools simply have the better teachers. And thus, those teachers must have been produced by the better colleges/universities.

Therefore, we should build all future policies around these ever-so-logical, unquestionably valid findings. That the teachers in high poverty schools whose children had initially lower performance and thus systematically lower SGPs, must be fired and a new batch brought in to replace them. Heck, if the new batch of teachers is even average (like teachers in schools of average poverty and average prior scores), then they can lift those SGPs and average scores of high poverty below average schools toward the average.

At the same time, we must track down the colleges of education responsible for producing those teachers in high poverty schools who failed their students so miserably and we must impose strict sanctions on those colleges.

That’ll work, right? No perverse incentives here? Especially since we are so confident in the validity of these measures?

Nothing can go wrong with this plan, right?

A vote of no confidence is long overdue here!





Segregating Suburbia: A Princeton Story

Others around me have for some time been raising concerns about the emergence of boutique, suburban charter schools. Until now, I’ve largely blown off those concerns in part as I’ve questioned just how much sorting a charter school can achieve in a relatively homogeneous suburban area.

Suburbs have their own unique portfolio of schools.  One might find in any leafy suburb near a major metropolitan area a very fine local public school district, perhaps a private catholic school in certain regions of the country and in many areas, an elite private independent day school or two – oft named “day school” or “country day school.” These portfolios have been in existence, in some cases, for centuries.  At some future point, I may discuss more extensively the public private balance issue and the role that elite, and less so, private schools play when embedded in otherwise elite communities that also have relatively elite public school systems.

Rarely would one expect to find the charter school movement trying to infiltrate this environment, adding that other element to the portfolio. And if and when this does happen, what niche do they try to fill? On the one hand, one might try to establish a charter that handles the “difficult” cases from the local school system – those that might not fit particularly well in either the public system or have access to appropriate private schooling.

But, I’m reminded… though I can’t find the link right now… of a Palo Alto, CA charter school that had basically established itself as the equivalent of a publicly subsidized elite private school. [Found! By a commenter below] It’s a rather clever financial model. If elite private school tuition is running at about $30k per year per child…and per pupil cost of a quality private education program about $32 to $35k… one could either pay that price, or gather a group of close friends, and apply for a charter, where each child might receive an allotment of $10 to $15k from the local district and then parents could quietly agree to chip in the other $15k to achieve similar quality schooling to  the private option – at half the price.

Of course, there are many additional costs of getting that ball rolling, including finding and leasing space for start up years, and running capital fundraising campaigns for future years. By establishing a charter school in this way, these parents really couldn’t officially exclude others from their school or obligate private contributions within their “club”… but they sure could make any free-rider, or other resource drain on their schooling model feel uncomfortable enough to leave.

On the one hand, it might not be considered that problematic for a group of parents with “average children” in the local district to require (via establishing a charter school) that district to subsidize their quasi-private endeavor.  I would argue that it becomes more problematic when an above average income group in the community, with relatively low need children (by usual classifications), obligates the local public school district to subsidize their segregationist preferences. That is, asking those less well off than you to subsidize your quasi-private school alternative.

But, just how much sorting can a suburban charter school achieve anyway? And can a suburban charter school establish itself as a quasi-elite-private school in a market  where there are already several private schooling options. That is, would parents of advantaged children actually seek to establish a school that taxes those less well off than them, to subsidize their charter school, instead of paying the full price of tuition at local private schools?  Evidence from Princeton, New Jersey suggests that the answer to this question may in fact be yes!

Let’s take a look.

Here’s the lay of the land… from the broad viewpoint… with district housing values in the brown shading.

Princeton GlobalPrinceton is the dark area in the middle of the picture, with very high average housing values. Princeton also is home to numerous… and I mean numerous private independent day and boarding schools, many of which (along a single road) serve a large portion of school aged children from Princeton and surrounding communities and many of which have been around for a very long time. Princeton is also known for having and exceptionally strong local public district. To the south and west is Trenton, with high poverty schools including high poverty charter schools (yellow stars). Notably, in Princeton the lowest poverty “public” school is Princeton Charter School. Princeton Public Schools each have much higher rates of children qualified for free or reduced priced lunch.

Here’s a zoom in on Princeton:

Princeton ZoomWhile many of the triangles (private schools) in other parts of the state are preschools, etc., many in  Princeton are actually relatively large elite private day and boarding schools.  Rather amazingly, Princeton Charter School appears – at least by exclusion of low income children – to be positioning itself as a publicly subsidized alternative to the elite private schools and not as a more broadly accessible charter alternative.

Here’s a breakout of the details on the Princeton Charter population compared to the district:


And here’s the composition of the special education populations:


That is, PCS has only the mildest, lowest cost children with disabilities.

Put bluntly, these figures show that the parent population of Princeton Charter is obligating the parents of much less advantaged children, including parents of children with special education needs, subsidize their preference to have a school more like the private day schools along Great Road.

While I’m still not entirely sure what to make of this… it does concern me.

It also ought to raise questions for leaders of private school alternatives in these communities. On balance, I’ve never seen the charter school movement as a particular competitive threat to private independent day schools, as charters have often been primarily urban, serving minority populations and employing “no excuses” strategies that most parents in leafy suburbs would not find palatable for their own children.

Urban charter schools have arguably taken their toll on urban catholic school enrollments, but that’s another story.  But, to the extent that state charter policies permit the type of school establishment and segregation going on in Princeton, more an more parents may find ways to organize quasi-private-elite schools to serve their needs – effectively seeking taxpayer charity to support their country club preferences. This indeed may pose a threat to financially less well endowed private schools.

In a twisted sort of way, it’s rather like asking your local public parks department to pay for your membership to the local private country club – thus reducing the quality of services to others who really don’t have access to the country club (even if it proclaims it’s open to all comers).

Much more to ponder here… but the numbers on Princeton Charter School certainly raise some serious red flags.

Note: In New Jersey and elsewhere, there are numerous other taxpayer subsidies that support private schooling, ranging from property tax exemptions and exemptions on charitable gifts, to textbook subsidies (loans from local districts) and transportation reimbursements. So, to an extent, all private schools and privately schooled children are receiving some level of subsidy at taxpayer expense. But, that level increases dramatically if/when the local public district is also required to hand over the full annual operating expense per child.

Newark Charter Update: A few new graphs & musings

It’s been a while since I’ve written anything about New Jersey Charter schools, so I figured I throw a few new graphs and tables out there. In the not too distant past, I’ve explained:

  1. That Newark charter schools in particular, persist in having an overall cream-skimming effect in Newark, creating demographic advantage for themselves and ultimately to the detriment of the district.
  2. That while the NJ CREDO charter school effect study showed positive effects of charter enrollment on student outcomes specifically (and only) in Newark, the unique features of student sorting (read skimming) in Newark make it difficult to draw any reasonable conclusions about the effectiveness of actual practices of Newark Charters. Note that in  my most recent post, I re-explain the problem with asserting school effects, when a sizable component of the school effect may be a function of the children (peer group) served.
  3. In many earlier posts, I evaluated the extent to which average performance levels of Newark (and other NJ) charter schools were higher or lower than those of demographically similar schools, finding that charters were/are pretty much scattered.
  4. And I’ve raised questions about other data – including attrition rates – for some high flying NJ charters.

As an update, since past posts have only looked at NJ charter performance in terms of “levels” (shares of kids proficient, or not), let’s take a look at how Newark district and charter schools compare on the state’s new school level growth percentile measures. In theory, these measures should provide us a more reasonable measure of how much the schools contribute to year over year changes in student test scores. Of course, remember, that school effect is conflated with peer effect and with every other attribute of the yearly in and out of school lives of the kids attending each school.

And bear in mind that I’ve critiqued in great detail previously that New Jersey’s growth percentile scores appear to do a particularly crappy job at removing biases associated with student demographics, or with average performance levels of kids in a cohort.  To summarize prior findings:

  1. school average growth percentiles tend to be lower in schools with higher average rates of proficiency to begin with.
  2. school average growth percentiles tend to be lower in schools with higher shares of low income children.
  3. school average growth percentiles tend to be lower in schools with more non-proficient scoring special education students.

And each of these relationships was disturbingly strong. So, any analysis of the growth percentile data must be taken with a grain of salt.

So, pretending for a moment that the growth percentile data aren’t complete garbage, let’s take a look at the growth percentile data for Newark Charter Schools, along side district schools.

Let’s start with a statewide look at charter school growth percentiles compared to district schools. In this figure, I’ve graphed the 7th grade ELA growth percentiles with respect to average school level proficiency rates, since the growth percentile data seem so heavily biased in this regard. As such, it seems most reasonable to try to account for this bias by comparing schools against those with the most similar current average proficiency rates.

Figure 1. Statewide Language Arts Growth with Respect to Average Proficiency (Grade 7)


Now, if we buy these growth percentiles as reasonable, then one of our conclusions might be that Robert Treat Academy is one of, if not the worst school in the state – at least in terms of its ability to contribute to test score gains. By contrast, Discovery Charter school totally rocks.

Other charters to be explored in greater depth below, like TEAM Academy in Newark fall in the “somewhat better than average” category (marginally above the trendline) and frequently cited standouts like North Star Academy somewhat higher (though in the cloud, statewide).

So, let’s focus on Newark in particular.

Figure 2. Newark Language Arts Growth with Respect to Average Proficiency (Grade 5)


Figure 3. Newark Language Arts Growth with Respect to Average Proficiency (Grade 6)


Figure 4. Newark Language Arts Growth with Respect to Average Proficiency (Grade 7)


Figure 5. Newark Language Arts Growth with Respect to Average Proficiency (Grade 8)

Slide5In my earlier posts, it was typically schools like Treat, North Star, Gray and Greater Newark that rose to the top, with TEAM posting more average results, but all of these results heavily mediated by demographic differences, with Treat and North Star hardly resembling district schools at all, and TEAM coming closer but still holding a demographic edge over district schools.

In these updated graphs, using the growth measures, one must begin to question the Robert Treat miracle especially. Yeah… they start high… and stay high on proficiency… but they appear to contributed little to achievement gains. Again, that is, if these measures really have any value at all. Gray is also hardly a standout… or actually it is a standout… but not in a good way.

TEAM continues to post solidly above average, but still in the non-superman (mere mortal) mix of district & charter schooling in Newark.

Remember, school gains are a function of all that goes on in the lives of kids assigned to each school, including in school and out of school stuff, including peer effect.

Let’s focus in on the contrast between TEAM and North Star for a bit. These are the two big ones in Newark now, and they’ve evolved over time toward providing K-12 programs. Here’s the most recent demographic data comparing income status and special education populations by classification, for NPS, TEAM and North Star.

Figure 6. Demographic data for NPS, TEAM and North Star (2012-13 enrollments & 2011-12 special education)


North Star especially continues to serve far fewer of the lowest income children. And, North Star continues to serve very few children with disabilities, and next to none with more severe disabilities. Similarly, in TEAM, most children with disabilities have only mild specific learning disabilities or speech/language impairment.

But this next piece remains the most interesting to me. I’ve not revisited attrition rates for some time, and now these schools are bigger and have a longer track record, so it’s hard to argue that the patterns we see over several cohorts, including the most recent several years, for schools serving over 1,000 children, are anomalies.  At this point, these data are becoming sufficiently stable and predictable to represent patterns of practice.

The next two tables map the changes in cohort size over time for cohorts of students attending TEAM and North Star. The major caveat of these tables is that if there are 80 5th graders one year and 80 6th graders the next, we don’t necessarily know that they are the same 80 kids. 5 may have left and been replaced by 5 new students. But, taking on new students does pose some “risk” in terms of expected test scores, so some charters engage in less “backfilling” than others, and fewer backfill enrollments in upper grades.

Since tests that influence SGPs are given in grades 5 – 8 (well, 3 – 8, but 5-8 is most relevant here), the extent to which kids drop off between grade 5 & 6, 6 & 7, and who drops off between those grades can, of course, affect the median measured gain (if kids who were more likely to show low gains leave, and thus aren’t around for the next year of testing, and those more likely to show high gains stay, then median gains will shift upward from what they might have otherwise been).

First, lets look at TEAM.

Figure 7. TEAM Cohort Attrition Rates

Slide8Among tested grade ranges, with the exception of the most recent cohort, TEAM keeps from the upper 80s to low 90s – percentages of 5th graders who make it to 8th grade (with potential replacement involved).  Any annual attrition may bias growth percentiles, as noted above, if potentially lower  gain students are more likely to leave. But without student level data, that’s a bit hard to tell.

TEAMs’ grade 5 to 12 attrition is greater, dropping over 25% of kids per cohort. From 9 to 12, about 20% disappear.

But these figures are far more striking for North Star.

Figure 8. North Star Cohort Attrition Rates

Slide7Within tested grades, North Star matches TEAM in the most recent year, but for previous years, North Star loses marginally more kids from grades 5 to 8, hanging mainly in the lower to mid 80s.  So, if there is bias in who is leaving – if weaker – slower gain students are more likely to leave, that may partially explain North Star’s greater gains seen above. Further, as weaker students leave, the peer group composition changes, also having potential positive effects on growth for those who remain.

Now… the other portion of attrition here doesn’t presently affect the growth percentile scores, but it is indeed striking, and raises serious policy concerns about the larger role of a school like North Star in the Newark community.

From grade 5 to 12, North Star persistently finishes less than half the number who started! As noted above, this is no anomaly at this point. It’s a pattern and a persistent one, over the four cohorts that have gone this far. I may choose to track this back further, but going back further brings us to smaller starting cohorts, increasing volatility.

Even from Grade 9 to 12, only about 65% persist.

Parsing these data a step further, let’s look specifically at attrition for Black boys at North Star.

Figure 9. Cohort Decline for Black Boys

Slide1I’ve flipped the direction of the years here…to be moving forward in the logical left to right direction. So, reorient yourself!  For grade 5 to 12, North Star had only one cohort that approached retaining 50% (well… actually, 42%). In other years, grade 5 to 12 attrition was around 75% or greater for black boys. Grade 9 to 12 attrition was about 40% in the most recent two years, and much more than that previously for black boys. Of the 50 or so annual entrants at 5th grade to North Star prior to recent doubling, only a handful would ever make it to 12th grade.

The concern here, of course, is what is happening to the rest of those students who leave, and what is the effect of this churn on surrounding schools – perhaps both charter and district schools who are absorbing these students who are so rapidly shed. [to the extent, if any, that exceptional middle school preparation at a school like North Star leads students to scholarship opportunities at elite private schools, or acceptance to highly selective magnet schools, this attrition may be less ugly than it looks]

Of course, this does lead one to question how North Star is able to report to the state a 100% graduation rate and a .3% dropout rate? Seems a bit suspect, eh?

Figure 9. What North Star reports as its dropout and graduation rates


Notably absent HERE, as well, is any mention of the fact that only a handful of kids actually stick around through grade 12?

So, is this data driven leadership, or little more than drive by data? Seems that they’ve missed a really, really critical issue. [if you lose more than half of your kids btw grades 5 and 8, and even more than that for one of your target populations – black boys – that kind of diminishes the value of the outcomes created for the handful who stay, doesn’t it? Not for the stayers individually, but certainly for the school as a whole.]

A few closing thoughts…

As I’ve mentioned on many previous occasions, it is issues such as this as well as the demographic effects of charters, magnets and other schools that induce student sorting in the district, that must be carefully tracked and appropriately managed.  Neither an actual public school, nor a school chartered to serve the public interest (with public resources) should be shielded from scrutiny.

If we are really serious about promoting a system of great schools (as opposed to a school system) which productively integrates charter and district schools, then we can nolonger sit by and permit behavior by some that is more likely than not, damaging to others (in that same system).  That’s simply not how a “system of great schools” works, or how any well-functioning system, biological, ecological, economic, social or otherwise works.

But sadly, those who most vociferously favor charter expansion as a key element of supposed “portfolio” models of schooling appear entirely uninterested mitigating parasitic activity (that which achieves the parasites goal at the expense of the host. e.g. parasitic rather than symbiotic). Rather, they fallaciously argue that an organism consisting entirely of potential parasites is itself, the optimal form. That the good host is one that relinquishes? (WTF?) As if somehow, the damaging effects of skimming and selective attrition might be lessened or cease to exist if the entirety of cities such as Newark were served only by charter schools.  Such an assertion is not merely suspect, it’s absurd.

So then, imagine if you will, an entire district of North Stars? Or an entire district of those who strive to achieve the same public accolades of North Star? That would sure work well from a public policy standpoint. They’d be in constant bitter battle over who could get by with the fewest of the lowest income kids. Anyone who couldn’t “cut it” in 5th or 6th grade, along with each and every child with a disability other than speech impairment would dumped out on the streets of Newark. Even after the rather significant front end sorting, we’d be looking at 45% citywide graduation rates – actually – likely much lower than that because some of the aspiring North Star’s would have to take students even less likely to complete under their preferred model.

Yes, there would probably eventually be some “market segmentation” (a hearty mix of segregation, tracking & warehousing of kids with disabilities) – special schools for the kids brushed off to begin with – and special schools for those shed later on. But, under current accountability policies, those “special schools” would be closed and reconstituted every few years or so since they won’t be able to post the requisite gains. Sounds like one hell of a “system of great schools,” doesn’t it.

To the extent we avoid changing the incentive structure & accountability system, the tendency to act parasitic rather than in a more beneficial relationship will dominate. The current system is driven by the need to post good numbers – good “reported” numbers. NJ has created a reporting system that allows North Star to post a 100% grad rate and .3% dropout rate despite completing less than 50% of their 5th graders.

What do they get for this? Broad awards, accolades from NJDOE… & the opportunity to run their own graduate school to train teachers in their stellar methods (that result in nearly every black boy leaving before graduation).

A major problem here is that the incentive structure, the accountability measures, and system as it stands favor taking the parasitic path to results.

That said, in my view, it takes morally compromised leadership to rationalize taking this to the extent that North Star has. TEAM, for example, exists under the very same accountability structures. And while TEAM does its own share of skimming and shedding, it’s no North Star.

But I digress.

More to come – perhaps.

Suspension Rates for Schools in Newark

Rebutting (again) the Persistent Flow of Disinformation on VAM, SGP and Teacher Evaluation

This post is in response to testimony overheard from recent presentations to the New Jersey State Board of Education. For background and more thorough explanations of issues pertaining to the use of Value-added Models and Student Growth Percentiles please see the following two sources:

  • Baker, B.D., Oluwole, J., Green, P.C. III (2013) The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5). This article is part of EPAA/AAPE’s Special Issue on Value-Added: What America’s Policymakers Need to Know and Understand, Guest Edited by Dr. Audrey Amrein-Beardsley and Assistant Editors Dr. Clarin Collins, Dr. Sarah Polasky, and Ed Sloat. Retrieved [date], from
  • Baker, B.D., Oluwole, J. (2013) Deconstructing Disinformation on Student Growth Percentiles & Teacher Evaluation in New Jersey. New Jersey Education Policy Forum 1(1)

Here, I address a handful of key points.

First, different choices of statistical model or method for estimating teacher “effect” on test score growth matter. Indeed, one might find that adding new variables, controlling for this, that and the other thing, doesn’t always shift the entire pattern significantly, but a substantial body of literature indicates that even subtle changes to included variables or modeling approach can significantly change individual teacher’s ratings and significantly reshuffle teachers across rating categories. Further, these changes may be most substantial for those teachers in the tails of the distribution – or those for whom the rating might be most consequential.

Second, I reiterate that Value-added models in their best, most thorough form, are not the same as student growth percentile estimates. Specifically, those who have made direct comparisons of VAMs versus SGPs for rating teachers have found that SGPs – by omission of additional variables – are less appropriate. That is, they don’t to a very good job of sorting out the teacher’s influence on test score growth!

Third, I point out that the argument that VAM as a teacher effect indicator is as good as batting average for hitters or earned run average for pitchers simply means that VAM is a pretty crappy indicator of teacher quality.

Fourth, I reiterate a point I’ve made on numerous occasions, that just because we see a murky pattern of relationship and significant variation across thousands of points in scatterplot doesn’t mean that we can make any reasonable judgment about the position of any one point in that mess. Using VAM or SGP to make high stakes personnel decisions about individual teachers violates this very simple rule. Sticking specific, certain, cut scores through these uncertain estimates in order to categorize teachers as effective or not violates this very simple understanding rule.

Two Examples of How Models & Variables Matter

States are moving full steam ahead on adopting variants of value added and growth percentile models for rating their teachers and one thing that’s becoming rather obvious is that these models and the data on which they rely vary widely. Some states and districts have chosen to adopt value added or growth percentile models that include only a single year of student prior scores to address differences in student backgrounds, and others are adopting more thorough value added models which also include additional student demographic characteristics, classroom characteristics including class size, and other classroom and school characteristics that might influence – outside the teacher’s control – the growth in student outcomes. Some researchers have argued that in the aggregate – across the patterns as a whole – this stuff doesn’t always seem to matter that much. But we also have a substantial body of evidence that when it comes to the individual rating teachers it does.

For example, a few years back, the Los Angeles times contracted Richard Buddin to estimate a relatively simple value-added model of teacher effect on test scores in Los Angeles. Buddin included prior scores and student demographic variables. However, in a critique of Buddin’s report, Briggs and Domingue ran the following re-analysis to determine the sensitivity of individual teacher ratings to model changes, including additional prior scores and additional demographic and classroom level variables:

The second stage of the sensitivity analysis was designed to illustrate the magnitude of this bias. To do this, we specified an alternate value-added model that, in addition to the variables Buddin used in his approach, controlled for (1) a longer history of a student’s test performance, (2) peer influence, and (3) school-level factors. We then compared the results—the inferences about teacher effectiveness—from this arguably stronger alternate model to those derived from the one specified by Buddin that was subsequently used by the L.A. Times to rate teachers. Since the Times model had five different levels of teacher effectiveness, we also placed teachers into these levels on the basis of effect estimates from the alternate model. If the Times model were perfectly accurate, there would be no difference in results between the two models. Our sensitivity analysis indicates that the effects estimated for LAUSD teachers can be quite sensitive to choices concerning the underlying statistical model. For reading outcomes, our findings included the following:

Only 46.4% of teachers would retain the same effectiveness rating under both models, 8.1% of those teachers identified as effective under our alternative model are identified as “more” or “most” effective in the L.A. Times specification, and 12.6% of those identified as “less” or “least” effective under the alternative model are identified as relatively effective by the L.A. Times model.

For math outcomes, our findings included the following:

Only 60.8% of teachers would retain the same effectiveness rating, 1.4% of those teachers identified as effective under the alternative model are identified as ineffective in the L.A. Times model, and 2.7% would go from a rating of ineffective under the alternative model to effective under the L.A. Times model.

The impact of using a different model is considerably stronger for reading outcomes, which indicates that elementary school age students in Los Angeles are more distinctively sorted into classrooms with regard to reading (as opposed to math) skills. But depending on how the measures are being used, even the lesser level of different outcomes for math could be of concern.

  • Briggs, D. & Domingue, B. (2011). Due diligence and the evaluation of teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District Teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center. Retrieved June 4, 2012 from

Similarly, Ballou and colleagues ran sensitivity tests of teacher ratings applying variants of VAM models:

As the availability of longitudinal data systems has grown, so has interest in developing tools that use these systems to improve student learning. Value-added models (VAM) are one such tool. VAMs provide estimates of gains in student achievement that can be ascribed to specific teachers or schools. Most researchers examining VAMs are confident that information derived from these models can be used to draw attention to teachers or schools that may be underperforming and could benefit from additional assistance. They also, however, caution educators about the use of such models as the only consideration for high-stakes outcomes such as compensation, tenure, or employment decisions. In this paper, we consider the impact of omitted variables on teachers’ value-added estimates, and whether commonly used single-equation or two-stage estimates are preferable when possibly important covariates are not available for inclusion in the value-added model. The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.

In short, the conclusions here are that model specification and variables included matter. And they can matter a lot. It is reckless and irresponsible to assert otherwise and even more so to never bother to run comparable sensitivity analyses to those above prior to requiring the use of measures for high stakes decisions.

SGP & a comprehensive VAM are NOT THE SAME!

This point is really just an extension of the previous. Most SGP models, which are a subset of VAM, take the simplest form of accounting only for a single prior year of test score. Proponents of SPGs like to make a big deal about how the approach re-scales the data from its original artificial test scaling to a scale-free (and thus somehow problem free?) percentile rank measure. The argument is that we can’t really ever know, for example, whether it’s easier or harder to increase your SAT (or any test) score from 600 to 650, or from 700 to 750, even though they are both 50 pt increases. Test-score distances simply aren’t like running distances. You know what? Neither are ranks/percentiles that are based on those test score scales! Rescaling is merely recasting the same ol’ stuff, though it can at times be helpful for interpreting results.  If the original scores don’t show legitimate variation – for example, if they have a  strong ceiling or floor effect, or simply have a lot of meaningless (noise) variation – then so too will any rescaled form of them.

Setting aside the re-scaling smokescreen, two recent working papers compare SGP and VAM estimates for teacher and school evaluation and both raise concerns about the face validity and statistical properties of SGPs.  And here’s what they find.

Goldhaber and Walch (2012) conclude “For the purpose of starting conversations about student achievement, SGPs might be a useful tool, but one might wish to use a different methodology for rewarding teacher performance or making high-stakes teacher selection decisions” (p. 30).

  •  Goldhaber, D., & Walch, J. (2012). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. University of Washington at Bothell, Center for Education Data & Research. CEDR Working Paper 2012-6.

Ehlert and colleagues (2012) note “Although SGPs are currently employed for this purpose by several states, we argue that they (a) cannot be used for causal inference (nor were they designed to be used as such) and (b) are the least successful of the three models [Student Growth Percentiles, One-Step & Two-Step VAM] in leveling the playing field across schools”(p. 23).

  •  Ehlert, M., Koedel, C., &Parsons, E., & Podgursky, M. (2012). Selecting growth measures for school and teacher evaluations. National Center for Analysis of Longitudinal Data in Education Research (CALDER). Working Paper #80.

If VAM is as reliable as Batting Averages or ERA, that simply makes it a BAD INDICATOR of FUTURE PERFORMANCE!

I’m increasingly mind-blown by those who return, time after time, to really bad baseball analogies to make their point that these value-added or SGP estimates are really good indicators of teacher effectiveness.  I’m not that much of a baseball statistics geek, though I’m becoming more and more intrigued as time passes.  The standard pro-VAM argument goes that VAM estimates for individual teachers have a correlation of about .35 from one year to the next. Casual readers of statistics often see this as “low” working from a relatively naïve perspective that a high correlation is about .8.  The idea is that a good indicator of teacher effect would have to be an indicator which reveals the true, persistent effectiveness of that teacher from year to year. Even better, a good indicator would be one that allows us to tell if that teacher is likely to be a good teacher in future years. A correlation of only about .35 doesn’t give us much confidence.

That said, let’s be clear that all we’re even talking about here is the likelihood that a teacher having students who showed test score gains in one year, is likely to have a new batch of students who show similar test score gains the following year (or at least in relative terms, the teacher who is above the average of teachers for their student test score gains remains similarly above the average of teachers for their students’ test score gains the following year). That is, the measure itself may be of very limited use, thus the extent to which it is consistent or not may not really be that important. But I digress.

In order to try to make a .35 correlation sound good, VAM proponents will often argue that the year over year correlation between baseball batting averages, or earned run averages is really only about the same. And since we all know that batting average and earned run average are really, really important baseball indicators of player quality, then VAM must be a really, really important indicator of teacher quality. Uh… not so much!

If there’s one thing Baseball statistics geeks really seem to agree on, it’s that Batting Averages and Earned Run Averages for pitchers are crappy predictors of future performance precisely because of their low year over year correlation.

This piece from provides some explanation:

Not surprisingly, Batting Average comes in at about the same consistency for hitters as ERA for pitchers. One reason why BA is so inconsistent is that it is highly correlated to Batting Average on Balls in Play (BABIP)–.79–and BABIP only has a year-to-year correlation of .35.

Descriptive statistics like OBP and SLG fare much better, both coming in at .62 and .63 respectively. When many argue that OBP is a better statistic than BA it is for a number of reasons, but one is that it’s more reliable in terms of identifying a hitter’s true skill since it correlates more year-to-year.

And this piece provides additional explanation of descriptive versus predictive metrics.

An additional really important point here, however, is that these baseball indicators are relatively simple, mathematical calculations – like taking the number of hits (relatively easily measured term) divided by at bats (also easily measured). These aren’t noisy regression estimates based on the test bubble-filling behaviors of groups of 8 and 9 year old kids.  And most baseball metrics are arguably more clearly related to the job responsibilities of the player – though the fun stuff enters in when we start talking about modeling personnel decisions in terms of their influence on wins above replacement.

Just because you have a loose/weak pattern across thousands of points doesn’t add to the credibility of judging any one point!

One of the biggest fallacies in the application of VAM (or SGP) is that having a weak or modest relationship between year over year estimates for the same teachers, produced across thousands of teachers serving thousands of students, provides us with good enough (certainly better than anything else!) information to inform school or district level personnel policy.

Wrong! Knowing that there exists a modest pattern in a scatterplot of thousands of teachers from year one to year two, PROVIDES US WITH LITTLE USEFUL INFORMATION ABOUT ANY ONE POINT IN THAT SCATTERPLOT!

In other words, given the degrees of noise in these best case (least biased) estimates, there exists very limited real signal about the influence of any one teacher on his/her student’s test scores.  What we have here is limited real signal on a measure – measured test score gains from last year to this – which captures a very limited scope of outcomes. And, if we’re lucky, we can generate this noisy estimate of a measure of limited value on about 1/5 of our teachers.

Asserting that useful information can be garnered about the position of a single point in a massive scatterplot, based on such a loose pattern violates the most basic understandings of statistics. And this is exactly what using Value Added estimates to evaluate individual teachers, and put them into categories based on specific cut scores applied these noisy measures does!

The idea that we can apply strict cut scores  to noisy statistical regression model estimates to characterize an individual teacher as “highly effective” versus merely “very effective” is statistically ridiculous, and validated as such by the resulting statistics themselves.

Can useful information be garnered from the pattern as whole? Perhaps. Statistics aren’t entirely worthless, nor is this variation of statistical application. I’d be in trouble if this was all entirely pointless.  These models and their resulting estimates describe patterns – patterns of test score growth across lots and lots of kids across lots and lots of teachers – and groups and subgroups of kids and teachers. And these models may provide interesting insights into groups and subgroups if the original sample size is large enough. We might find that teachers applying one algebra teaching approach in several schools appear to be advancing students’ measured grasp of key concepts better than teachers in other schools (assuming equal students and settings) applying a different teaching method?

But we would be hard pressed to say with any certainty, which of these teachers are “good teachers” and which are “bad.”

The Dramatic Retreat from Funding Equity in New Jersey: Evidence from the Census Fiscal Survey

I have explained in numerous previous posts how New Jersey is among those states that operates a reasonably progressive state school finance system, that New Jersey, throughout the 1990s and early 2000s put the effort into disrupting the relationship between local community income and school spending. And, during that period, New Jersey’s low income students appear to have experienced some gains, at least when compared with other demographically similar states. Massachusetts, like New Jersey, also improved the progressiveness of its state school funding system over the same period, but Connecticut not so much.  Here are some figures from a previous post:

Figure 1. Disrupting the relationship between income and school spending 1990 to 2004


Figure 2.  NAEP Gains of Children qualified for Free Lunch (Math)


Figure 3. NAEP Gains of Children qualified for Free Lunch (Reading)


New Jersey has maintained strong position relative to other states both in terms of NAEP achievement gains, especially for lower income students and in terms of school funding fairness in our annual report.  I have often used New Jersey as a model of a sound, progressive state school funding system and one that has produced some reasonable initial results. In fact, I was about to start writing a post on that very point. Way too many of my posts on school funding equity/inequity have been negative. Heck, I just posted the “most screwed” districts in the nation. I was looking for an upside. A model. Some positives. A state that has maintained a solid progressive funding system even through bad times. So, I went back to the New Jersey data, and included the recently released 2010-11 Census Bureau data. What I found was really sad.

The following figures reveal the damage to funding progressiveness accomplished in New Jersey over a relatively short period of time. A system that was among the nation’s most progressive in terms of school funding as recently as 2009 appears – based on the most recent census bureau data on current expenditures per pupil – to have slipped not only slightly… but dramatically.  Here are the year to year snapshots, first as graphs of the actual district positions (for districts enrolling 2,000 or more pupils, with circle/triangle size indicating enrollment size) and then as the lines of best fit for each distribution, which indicates the “progressiveness” of the funding system with respect to poverty.

Figure 4. New Jersey Districts 2005 to 2007


Figure 5. New Jersey Progressiveness 2005 t0 2007

Slide6Note – Funding level increases and progressiveness (tilt from low to high poverty) stays stable)

Figure 6. New Jersey Districts 2007 to 2009


Figure 7. New Jersey Progressiveness 2007 to 2009


Figure 8. New Jersey Districts 2009 to 2011


Figure 9. New Jersey Progressiveness 2009 to 2011


The damage done is rather striking and far beyond what I ever would have expected to see in these data. It may be that there are problems in the data themselves, but separate analyses of the revenue and expenditure data and use of alternative enrollment figures thus far have produced consistent results. In fact, analyses using state and local revenue data look even worse for New Jersey. And these charts do not adjust for various cost factors. They are what they are (variable ppcstot, or per pupil current spending, with respect to census poverty rates).

Meanwhile, efforts continue to cause even more damage to funding equity in New Jersey, amazingly using the argument that reducing the funding targeted to higher need districts and shifting it to others will somehow help New Jersey reduce its (misrepresented) achievement gap between high and low income children.

We may or may not begin to see the fallout – the real damages – of these shifts this year, or even next. But there will undoubtedly be consequences. Current policy changes, such as the use of bogus metrics to rate and remove mythically bad teachers will not make it less costly for high poverty districts to recruit and retain quality staff.  In fact, it may make it more expensive, given the increased disincentive for teachers to seek employment in higher poverty settings, all else equal. Nor will newly adopted half-baked school performance rating schemes. Nor will the state’s NCLB waiver which hoists new uncertainties and instabilities onto districts serving the neediest students with annually less competitive revenues and expenditures.

As I’ve said numerous times on this blog – equitable and adequate funding are prerequisite conditions for all else. Money matters.  And the apparent dramatic retreat from equity in New Jersey over a relatively short period of time raises serious concerns.


Additional Figures

Below is the retreat from equity in state and local revenues per pupil with respect to poverty. In this case, I’ve expressed state and local revenues relative to the average state and local revenues of districts sharing the same labor market and I’ve expressed poverty similarly.




Friday AM Graphs: Just how biased are NJ’s Growth Percentile Measures (school level)?

New Jersey finally released the data set of its school level growth percentile metrics. I’ve been harping on a few points on this blog this week.

SGP data here:

Enrollment data here:

First, that the commissioner’s characterization that the growth percentiles necessarily fully take into account student background is a completely bogus and unfounded assertion.

Second, that it is entirely irresponsible and outright reckless that they’ve chosen not even to produce technical reports evaluating this assertion.

Third, that growth percentiles are merely individual student level descriptive metrics that simply have no place in the evaluation of teachers, since they are not designed (by their creator’s acknowledgement) for attribution of responsibility for that student growth.

Fourth, that the Gates MET studies provide absolutely no validation of New Jersey’s choice to use SGP data in the way proposed regulations mandate.

So, this morning I put together four quick graphs of the relationship between school level percent free lunch and median SGPs in language arts and math and school level 7th grade proficiency rates and median SGPs in language arts and math. Just how bad is the bias in the New Jersey SGP/MGP data?  Well, here it is! (actually, it was bad enough to shock me)

First, if you are a middle school with higher percent free lunch, you are, on average likely to have a lower growth percentile rating in Math. Notably, the math ASK assessment has significant ceiling effect leading into middle grades, perhaps weakening this relationship. (more on this at a later point)Slide1

If your are a middle school with higher percent free lunch, you are, on average, likely to have a lower growth percentile rating in English Language Arts. This relationship is actually even more biased than the math relationship (uncommon for this type of analysis), likely because the ELA assessment suffers less ceiling effect problem.

Slide2As with many if not most SGP data, the relationship is actually even worse when we look at the correlation with average performance level of the school, or peer group. If your school has higher proficiency rates to begin with, your school will quite likely have a higher growth percentile ranking:


The same applies for English Language Arts:


Quite honestly these the worst – most biased – school level growth data I think I’ve ever seen.

They are worse than New York State.

They are much worse than New York City.

And they are worse than Ohio.

And this is just a first cut at them. I suspect that if I have actual initial scores or even school level scale scores, the relationship between those scores and growth percentile is even stronger. But will test that when opportunity presents itself.

Further, because the bias is so strong at the school level – it is likely also quite strong at the teacher level.

New Jersey’s school level MGPs are highly unlikely to be providing any meaningful indicator of the actual effectiveness of teachers, administrators and practices of New Jersey schools.  Rather, by conscious choice to ignore contextual factors of schooling (be it the vast variations in the daily lives of individual children, or the difficult to measure power of peer group context, and various  other social contextual factors), New Jersey’s growth percentile measures fail miserably.

No school can be credibly rated as effective or not based on these data, nor can any individual teacher be cast as necessarily effective or ineffective.

And this not at all unexpected.

Additional Graphs: Racial Bias



Just for fun, here’s a multiple regression model which yields additional factors that are statistically associated with school level MGPs. First and foremost, these factors explain over 1/3 of the variation in Language Arts MGPs. That is, Language Arts MGPs seem heavily contingent upon a) student demographics, b) location and c) grade range of school.  In other words, if we start using these data as a basis for de-tenuring teachers, we will likely be detenuring teachers quite unevenly with respect to a) student demographics, b) location and c) grade range… despite having little evidence that we are actually validly capturing teacher effectiveness – and substantial implication here that we are, in fact, NOT.

Patterns for math aren’t much different. Less variance is explained, again, I suspect because of the strong ceiling effect on math assessments in the upper elementary/middle grades. There appears to be a charter school positive effect in this regression, but I remain too suspicious of attaching any meaningful conclusions to these data. Besides, if we assert this charter effect to be true as a function of these MGPs being somehow valid, then we’d have to accept that charters like Robert Treat in Newark are doing a particularly poor job (very low MGP either compared to similar demographic schools, or similar average performance level schools).

School Level Regression of Predictors of Variation in MGPs

school mgp regression

*p<.05, **p<.10

At this point, I think it’s reasonable to request that the NJDOE turn over masked (removing student identifiers) versions of their data… the student level SGP data (with all relevant demographic indicators), matched to teachers, attached to school IDs, and also including certifying institutions of each teacher.  These data require thorough vetting at this point as it would certainly appear that they are suspect as a school evaluation tool. Further, any bias that becomes apparent to this degree at the school level – which is merely an aggregation of teacher/classroom level data – indicates that these same problems exist in the teacher level data. Given the employment consequences here, it is imperative that NJDOE make these data available for independent review.

Until these data are fully disclosed (not just their own analyses of them, which I expect to be cooked up any day now), NJDOE and the Board of Education should immediately cease moving forward on using these data either for any consequential decisions either for schools or individual teachers. And if they do not, school administrators, local boards of education and individual teachers and teacher preparation institutions (which are also to be rated by this shoddy information) should JUST SAY NO!

A few more supplemental analyses






On Misrepresenting (Gates) MET to Advance State Policy Agendas

In my previous  post I chastised state officials for their blatant mischaracterization of metrics to be employed in teacher evaluation. This raised (in twitter conversation) the issue of the frequent misrepresentation of findings from the Gates Foundation Measures of Effective Teaching Project (or MET). Policymakers frequently invoke the Gates MET findings as providing broad based support for however they might choose to use, whatever measures they might choose to use (such as growth percentiles).

Here is one example in a recent article from NJ Spotlight (John Mooney) regarding proposed teacher evaluation regulations in New Jersey:

New academic paper: One of the most outspoken critics has been Bruce Baker, a professor and researcher at Rutgers’ Graduate School of Education. He and two other researchers recently published a paper questioning the practice, titled “The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era.” It outlines the teacher evaluation systems being adopted nationwide and questions the use of SGP, specifically, saying the percentile measures is not designed to gauge teacher effectiveness and “thus have no place” in determining especially a teacher’s job fate.

The state’s response: The Christie administration cites its own research to back up its plans, the most favored being the recent Measures of Effective Teaching (MET) project funded by the Gates Foundation, which tracked 3,000 teachers over three years and found that student achievement measures in general are a critical component in determining a teacher’s effectiveness.

I asked colleague Morgan Polikoff of the University of Southern California for his comments. Note that Morgan and I aren’t entirely on the same page on the usefulness of even the best possible versions of teacher effect (on test score gain) measures… but we’re not that far apart either.  It’s my impression that Morgan believes that better estimated measures can be more valuable – more valuable than I perhaps think they can be in policy decision making. My perspective is presented here (and Morgan is free to provide his).  My skepticism in part arises from my perception that there is neither interest among or incentive for state policymakers to actually develop better measures (as evidenced in my previous post). And that I’m not sure some of the major issues can ever be resolved.

That aside, here are Morgan Polikoff’s comments regarding misrepresentation of the Gates MET findings – in particular, as applied to states adopting student growth percentile measures:

As a member of the Measures of Effective Teaching (MET) project research team, I was asked by Bruce to pen a response to the state’s use of MET to support its choice of student growth percentiles (SGPs) for teacher evaluations. Speaking on my behalf only (and not on behalf of the larger research team), I can say that the MET project says nothing at all about the use of SGPs. The growth measures used in the MET project were, in fact, based on value-added models (VAMs) ( The MET project’s VAMs, unlike student growth percentiles, included an extensive list of student covariates, such as demographics, free/reduced-price lunch, English language learner, and special education status.

Extrapolating from these results and inferring that the same applies to SGPs is not an appropriate use of the available evidence. The MET results cannot speak to the differences between SGP and VAM measures, but there is both conceptual and empirical evidence that VAM measures that control for student background characteristics are more conceptually and empirically appropriate (link to your paper and to Cory Koedel’s AEFP paper). For instance, SGP models are likely to result in teachers teaching the most disadvantaged students being rated the poorest (cite Cory’s paper). This may result in all kinds of negative unintended consequences, such as teachers avoiding teaching these kinds of students.

In short, state policymakers should consider all of the available evidence on SGPs vs. VAMs, and they should not rely on MET to make arguments about measures that were not studied in that work.



Baker, B.D., Oluwole, J., Green, P.C. III (2013) The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5). This article is part of EPAA/AAPE’s Special Issue On Value-Added: What America’s Policymakers Need to Know and Understand, Guest Edited by Dr. Audrey Amrein-Beardsley and Assistant Editors Dr. Clarin Collins, Dr. Sarah Polasky, and Ed Sloat. Retrieved [date], from

Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012). Selecting Growth Measures for School and Teacher Evaluations.

(Updated alternate version:


Who will be held responsible when state officials are factually wrong? On Statistics & Teacher Evaluation

While I fully understand that state education agencies are fast becoming propaganda machines, I’m increasingly concerned with how far this will go.  Yes, under NCLB, state education agencies concocted completely wrongheaded school classification schemes that had little or nothing to do with actual school quality, and in rare cases, used those policies to enforce substantive sanctions on schools. But, I don’t recall many state officials going to great lengths to prove the worth – argue the validity – of these systems. Yeah… there were sales-pitchy materials alongside technical manuals for state report cards, but I don’t recall such a strong push to advance completely false characterizations of the measures. Perhaps I’m wrong. But either way, this brings me to today’s post.

I am increasingly concerned with at least some state officials’ misguided rhetoric promoting policy initiatives built on information that is either knowingly suspect, or simply conceptually wrong/inappropriate.

Specifically, the rhetoric around adoption of measures of teacher effectiveness has become driven largely by soundbites that in many cases are simply factually WRONG.

As I’ve explained before…

  • With value added modeling, which does attempt to parse statistically the relationship between a student being assigned to teacher X and that students achievement growth, controlling for various characteristics of the student and the student’s peer group, there still exists a substantial possibility of random-error based mis-classification of the teacher or remaining bias in the teacher’s classification (something we didn’t catch in the model affected that teacher’s estimate). And there’s little way of knowing what’s what.
  • With student growth percentiles, there is no attempt to parse statistically the relationship between a student being assigned a particular teacher and the teacher’s supposed responsibility for that student’s change among her peers in test score percentile rank.

This article explains these issues in great detail.

And this video may also be helpful.

Matt Di     Carlo has written extensively about the question of whether and how well value-added modes actually accomplish their goal of fully controlling for student backgrounds.

Sound Bites don’t Validate Bad or Wrong Measures!

So, let’s take a look at some of the rhetoric that’s flying around out there and why and how it’s WRONG.

New Jersey has recently released its new regulations for implementing teacher evaluation policies, with heavy reliance on student growth percentile scores, ultimately aggregated to the teacher level as median growth percentiles. When challenged about whether those growth percentile scores will accurately represent teacher effectiveness, specifically for teachers serving kids from different backgrounds, NJ Commissioner Christopher Cerf explains:

“You are looking at the progress students make and that fully takes into account socio-economic status,” Cerf said. “By focusing on the starting point, it equalizes for things like special education and poverty and so on.” (emphasis added)

Here’s the thing about that statement. Well, two things. First, the comparisons of individual students don’t actually explain what happens when a group of students is aggregated to their teacher and the teacher is assigned the median student’s growth score to represent his/her effectiveness, where teacher’s don’t all have an evenly distributed mix of kids who started at similar points (to other teachers). So, in one sense, this statement doesn’t even address the issue.

More importantly, however, this statement is simply WRONG!

There’s little or no research to back this up, but for early claims of William Sanders and colleagues in the 1990s in early applications of value added modeling which excluded covariates. Likely, those cases where covariates have been found to have only small effects are cases in which those effects are drowned out by noise or other bias resulting from underlying test scaling (or re-scaling) issues – or alternatively, crappy measurement of the covariates. Here’s an example of the stepwise effects of adding covariates on teacher ratings.

Consider that one year’s assessment is given in April. The school year ends in late June. The next year’s test is given the next April. First, and tangential (to the covariate issue… but still important) there are approximately two months of instruction given by the prior year’s teacher that are assigned the current year’s teacher. Beyond that, there are a multitude of things that go on outside of the few hours a day where the teacher has contact with a child, that influence any given child’s “gains” over the year, and those things that go on outside of school vary widely by children’s economic status. Further, children with certain life experiences on a continued daily/weekly/monthly basis are more likely to be clustered with each other in schools and classrooms.

With annual test scores – differences in summer experiences (slide 20) which vary by student economic background matter – differences in home settings and access to home resources matters – differences in access to outside of school tutoring and other family subsidized supports may matter and depend on family resources.  Variations in kids’ daily lives more generally matter (neighborhood violence, etc.) and many of those variations exist as a function of socio-economic status.

Variations in peer group with whom children attend school matters, and also varies by socio-economic status, neighborhood structure, conditions, and varies by socioeconomic status of not just the individual child, but the group of children. (citations and examples available in this slide set)

In short, it is patently false to suggest that using the same starting point “fully takes into account socio-economic status.”

It’s certainly false to make such a statement about aggregated group comparisons – especially while never actually conducting or producing publicly any analysis to back such a ridiculous claim.

For lack of any larger available analysis of aggregated (teacher or school level) NJ growth percentile data, I stumbled across this graph from a Newark Public Schools presentation a short while back.


Interestingly, what this graph shows is that the average score level in schools is somewhat positively associated with the median growth percentile, even within Newark where variation is relatively limited. In other words, schools with higher average scores appear to achieve higher gains. Peer group effect? Maybe. Underlying test scaling effect? Maybe. Don’t know. Can’t know.

The graph provides another dimension that is also helpful. It identifies lower and higher need schools – where “high need” are the lowest need in the mix. They have the highest average scores, and highest growth percentiles. And this is on the English/language arts assessment, where Math assessments tend to reveal stronger such correlations.

Now, state officials might counter that this pattern actually occurs because of the distribution of teaching talent… and has nothing to do with model failure to capture differences in student backgrounds. All of the great teachers are in those lower need, higher average performing schools! Thus, fire the others, and they’ll be awesome too! There is no basis for such a claim given that the model makes no attempt beyond prior score to capture student background.

Then there’s New York State, where similar rhetoric has been pervasive in the state’s push to get local public school districts to adopt state compliant teacher evaluation provisions in contracts, and to base those evaluations largely on state provided growth percentile measures. Notably, New York State unlike New Jersey actually realized that the growth percentile data required adjustment for student characteristics. So they tried to produce adjusted measures. It just didn’t work.

In a New York Post op-ed, the Chancellor of the Board of Regents opined:

The student-growth scores provided by the state for teacher evaluations are adjusted for factors such as students who are English Language Learners, students with disabilities and students living in poverty. When used right, growth data from student assessments provide an objective measurement of student achievement and, by extension, teacher performance.

So, what’s wrong with that? Well… mainly… that it’s… WRONG!

First, as I elaborate below, the state’s own technical report on their measures found that they were in fact not an unbiased measure of teacher or principal performance:

Despite the model conditioning on prior year test scores, schools and teachers with students who had higher prior year test scores, on average, had higher MGPs. Teachers of classes with higher percentages of economically disadvantaged students had lower MGPs. (p. 1)

That said, the Chancellor has cleverly chosen her words. Yes, it’s adjusted… but the adjustment doesn’t work. Yes, they are an objective measure. But they are still wrong. They are a measure of student achievement. But not a very good one.

But they are not by any stretch of the imagination, by extension, a measure of teacher performance. You can call them that. You can declare them that in regulations. But they are not.

To ice this reformy cake in New York, the Commissioner of Education has declared in letters to individual school districts regarding their evaluation plans, that any other measure they choose to add along side the state growth percentiles must be acceptably correlated with the growth percentiles:

The department will be analyzing data supplied by districts, BOCES and/or schools and may order a corrective action plan if there are unacceptably low correlation results between the student growth subcomponent and any other measure of teacher and principal effectiveness…

Because, of course, the growth percentile data are plainly and obviously a fair, balanced objective measure of teacher effectiveness.


But it’s better than the Status Quo!

The standard retort is that marginally flawed or not, these measures are much better than the status quo. ‘Cuz of course, we all know our schools suck. Teachers really suck. Principals enable their suckiness.  And pretty much anything we might do… must suck less.

WRONG – it is absolutely not better than the status quo to take a knowingly flawed measure, or a measure that does not even attempt to isolate teacher effectiveness, and use it to label teachers as good or bad at their jobs. It is even worse to then mandate that the measure be used to take employment action against the employee.

It’s not good for teachers AND It’s not good for kids. (noting the stupidity of the reformy argument that anything that’s bad for teachers must be good for kids, and vice versa)

On the one hand, these ridiculous rigid, ill-conceived, statistically and legally inept and morally bankrupt policies will most certainly lead to increased, not decreased litigation over teacher dismissal.

On the other hand… The anything is better than the status quo argument is getting a bit stale and was pretty ridiculous to begin with.  Jay Matthews of the Washington Post acknowledged his preference for a return toward the status quo (suggesting different improvements) in a recent blog post, explaining:

We would be better off rating teachers the old-fashioned way. Let principals do it in the normal course of watching and working with their staff. But be much more careful than we have been in the past about who gets to be principal, and provide much more training.

In closing, the ham-fisted argument of the anti-status quo argument, as applied to teacher evaluation, is easily summarized as follows:

Anything > Status Quo

Where the “greater than” symbol implies “really freakin’ better than… if not totally awesome… wicked awesome in fact,” but since it’s all relative, it would have be “wicked awesomer.”

Because student growth measures exists and purport to measure student achievement growth which is supposed to be a teacher’s primary responsibility, it therefore counts as “something,” which is a subclass of “anything” and therefore it is better than the “status quo.” That is:

Student Growth Measures = “something”

Something ⊆ Anything (something is a subset of anything)

Something > Status Quo

Student Growth Measures > Current Teacher Evaluation

Again, where “>”  means “awesomer” even though we know that current teacher evaluation is anything but awesome.

It’s just that simple!

And this is the basis for modern education policymaking?

Civics 101: School Finance Formulas & the Limits of Executive Authority

This post addresses a peculiar ongoing power grab in New Jersey involving the state school finance formula. The balance of power between state legislatures and the executive branch varies widely across states, but this New Jersey example may prove illustrative for others as well.  This post may make more sense if you take the time to browse these other two posts from the past few years.

  1. Student Enrollment Counts and State School Finance Formulas
  2. Twisted Truths and Dubious Policies: Comments on the Cerf School Funding Report

[yeah… I know… prerequisite readings don’t always go over well on a blog – but please check them out!]

In New Jersey, as in many other states the State School Finance Formula is a state statute. That is, an act of the legislature. State school finance formula statutes may vary in the degree of detail that they actually lay out in statutory language, including varying the precision of which specific numbers must be used in the calculations and in some cases specifically how the calculations are carried out. It is my impression that until recently, many state school finance statutes have been articulated in law with greater and greater precision – meaning also less latitude for the formula to be altered in its implementation (often through a state board of education).

The New Jersey school finance formula is articulated with relatively high precision in the language of the statute itself, like many other similar state school finance formulas. Again, it’s an act of the legislature – specifically, this act of the legislature of 20008:

AN ACT providing for the maintenance and support of a thorough and efficient system of free public schools and revising parts of the statutory law.

BE IT ENACTED by the Senate and General Assembly of the State of New Jersey:

(New section) This act shall be known and may be cited as the “School Funding Reform Act of 2008.”

Among other things, the statute spells out clearly the equations for calculating each district’s state aid, which involve first calculating the enrolled students to be funded through the formula.  In short, most modern school finance formulas apply the following basic approach:

  • STEP 1: Target Funding = [Base Funding x Enrollment + (Student Needs Weight x Base Funding x Student Needs Enrollment)] x Geographic Cost Adjustment
  • STEP 2: State Aid = Target Funding – Local Revenue Requirement

Using this general approach above, how students are counted necessarily has a substantive effect on how much aid is calculated, and ultimately delivered. And in most such formulas, how the basic  enrollments are counted has a multiplicative, ripple effect throughout the entire formula. So, it matters greatly how kids are counted for funding purposes. This is likely why state statutes often articulate quite clearly exactly how kids are to be counted for funding purposes.

The School Funding Reform Act of 2008 articulates precisely the definitions of fundable student enrollment counts. The following calculations and definitions are copied and pasted directly from the legislation.

Weighted Enrollment Definition

(New section) The weighted enrollment for each school district and county vocational school district shall be calculated as follows:

 WENR = (PW x PENR) + (EW x EENR) + (MW x MENR) + (HWx HENR)


PW is the applicable weight for kindergarten enrollment;

EW is the weight for elementary enrollment;

MW is the weight for middle school enrollment;

HW is the weight for high school enrollment;

PENR is the resident enrollment for kindergarten;

EENR is the resident enrollment for grades 1 – 5;

MENR is the resident enrollment for grades 6 – 8; and

HENR is the resident enrollment for grades 9 – 12.

Legal Definition of Resident Enrollment

“Resident enrollment” means the number of pupils other than preschool pupils, post-graduate pupils, and post-secondary vocational pupils who, on the last school day prior to October 16 of the current school year, are residents of the district and are enrolled in: (1) the public schools of the district, excluding evening schools, (2) another school district, other than a county vocational school district in the same county on a full-time basis, or a State college demonstration school or private school to which the district of residence pays tuition, or (3) a State facility in which they are placed by the district; or are residents of the district and are: (1) receiving home instruction, or (2) in a shared-time vocational program and are regularly attending a school in the district and a county vocational school district. In addition, resident enrollment shall include the number of pupils who, on the last school day prior to October 16 of the prebudget year, are residents of the district and in a State facility in which they were placed by the State. Pupils in a shared-time vocational program shall be counted on an equated full-time basis in accordance with procedures to be established by the commissioner. Resident enrollment shall include regardless of nonresidence, the enrolled children of teaching staff members of the school district or county vocational school district who are permitted, by contract or local district policy, to enroll their children in the educational program of the school district or county vocational school district without payment of tuition. Disabled children between three and five years of age and receiving programs and services pursuant to N.J.S.18A:46-6 shall be included in the resident enrollment of the district;

Not much there left to the imagination and certainly not a great deal of flexibility on implementation. It’s in the statute. It’s in the act adopted by the legislature. It is, quite literally, the law.

Executive Budget Language (2012-13 budget)

Civics 101 tells us that the executive branch of federal or state government doesn’t write the laws. Rather, it upholds them and its executive departments in some cases may be charged with implementing the laws, including adoption of implementing regulations – that is, adding the missing precision needed to actually implement the law. Of course, regulations on how a law is to be implemented can’t actually change the law itself.

Now, in some states like New Jersey, the Governor’s office has significant budgetary authority, including a line item veto option. Of course, that doesn’t however mean that the Governor’s office has the authority to actually rewrite the equations for school funding that were adopted by the legislature. It may mean that the Governor can underfund, or defund the formula as a whole, but that raises an entirely different set of constitutional questions, which I previously addressed here.

Specifically, what we have here are two separate bills/laws. First, there is the the statute enacting the formula, which sets forth substantive standards that must be applied from year to year, unless amended through usual Legislative process of proposing amendment in bill, committee review, and vote on the bill.  Then, there is the budget bill, which appropriates state school aid for each fiscal year and only is in effect for that year.  At their intersection, the appropriations in the budget bill are to be based on the ongoing formula requirements in the formula statute. Increasingly, it would appear that governors are attempting to affect changes to their state school funding formulas through their annual budget bills. Strategically, it can be hard for legislatures to successfully amend these budget bills and re-implement their formula as adopted, because the annual budget bills include everything under the sun (all components of the state budget) and not just school funding.

Last year, the Governor’s office, through the executive budget, did actually change the equation – which is the law. And, by first glance of this year’s district by district aid runs it would appear that they have again done the same. It would appear, though I’ve yet to receive the data to validate, that the governor’s office in producing its estimates of how much each district should receive, relied on the same method as for the current year – a method which proposes to reduced specific weighting factors in the formula, and perhaps most disturbingly, exerts executive authority to change the basic way in which kids are counted for funding purposes.

Here’s the language from last year’s executive budget book:

pg D-83

Notwithstanding the provisions of any law or regulation to the contrary, the projected resident enrollment used to determine district allocations of the amounts hereinabove appropriated for Equalization Aid, Special Education Categorical Aid, and Security Aid shall include an attendance rate adjustment, which is defined as the amount the state attendance rate threshold exceeds the district’s three–year average attendance rate, as set forth in the February 23, 2012 State aid notice issued by the Commissioner of Education.

Did you catch that? It says that resident enrollment, throughout the formula will be adjusted in accordance with an attendance rate factor. A facto that is not, in fact, in the legislation itself. It is not part of the equation that is the law.

Here’s a mathematical expression of the change:

Legal Funding Formula

AB = (BC + AR Cost + LEP Cost + COMB Cost + SE Census) x GCA


AR Cost = BPA x ARWENR x AR Weight

LEP Cost = BPA x LWENR x LEP Weight

COMB Cost = BPA x CWENR x (AR Weight + COMB Weight)

Executive Funding Formula

AB = (BC + AR Cost + LEP Cost + COMB Cost + SE Census) x GCA


AR Cost = BPA x ARWENR x CRAP* x AR Weight

LEP Cost = BPA x LWENR x CRAP* x LEP Weight

COMB Cost = BPA x CWENR x CRAP* x (AR Weight + COMB Weight)

*Cerf Reduction for Attending Pupils [attributed to Cerf here because this adjustment was originally proposed in his report to the Governor on the school finance formula]

This change is unquestionably a change to the law itself. This is a substantive change with ripple effects throughout the formula. And as I understand Civics 101, such a change is well beyond the authority of the executive branch.

Permitting such authority to go unchecked is a dangerous precedent!

Then again, it’s a precedent already endorsed by the President and U.S. Secretary of ed in their choice to grant waivers to states and local districts to ignore No Child Left Behind, which was/is an act of Congress.  But who cares about that pesky old checks and balances stuff anyway? That’s so… old school… so… constitutional…

Why it Matters

What I find most offensive about this power play is that the change imposed through abuse of executive power is a change to enrollment count that is well understood to be the oldest trick in the book for reducing aid to high poverty, high minority concentration districts.

In New Jersey, as elsewhere, attendance rates are lower – for reasons well beyond school & district control – in districts serving larger shares of low income and minority children. Using attendance rates to adjust funding necessarily, systematically reduces funding from higher poverty districts. Here are the attendance rates by grade level and by district factor group (where A districts are low wealth/income districts and IJ are high wealth/income).

CRAP Adjustment

And here are a handful of related articles which address this issue, and related issues in other settings:

  • Baker, B. D., & Green III, P. C. (2005). Tricks of the Trade: State Legislative Actions in School Finance Policy That Perpetuate Racial Disparities in the Post‐Brown Era. American Journal of Education, 111(3), 372-413.
  • Baker, B. D., & Corcoran, S. P. (2012). The Stealth Inequities of School Funding: How State and Local School Finance Systems Perpetuate Inequitable Student Spending. Center for American Progress.
  • Green III, P. C., & Baker, B. D. (2006). Urban Legends, Desegregation and School Finance: Did Kansas City Really Prove That Money Doesn’t Matter. Mich. J. Race & L., 12, 57.

Twisted Truths & Dubious Policies: Comments on the NJDOE/Cerf School Funding Report

Yesterday, we were blessed with the release of yet another manifesto (as reported here on NJ Spotlight) from what has become the New Jersey Department of Reformy Propaganda.  To be fair, it has become increasingly clear of late, that this is simply the new model for State Education Agencies (see NYSED Propaganda Here), with the current US Dept of Education often leading the way.

Notably, there’s little change in this report from a) the last one or b) the Commissioner’s state of the schools address last spring.

The core logic of the original report remains intact:

  1. That NJ has a problem – and that problem is  the achievement gap between low income and non-low income kids;
  2. That spending money on these kids doesn’t help – in fact it might just hurt – but it’s certainly a waste;
  3. Therefore, the logical solution to improving the achievement gap is to reduce funding to districts serving low income and non-English speaking kids and shift that funding to others.

Here’s a quick walk-through…

The Crisis?

The new report, like the previous, zeros in on the problem of New Jersey’s achievement gap between low income and non-low income kids. Now, the reason that the recent reports have focused so heavily on the achievement gap is that in the early days of this administration, the rhetoric was focused on the system as a whole being academically bankrupt. The simple response was to point out that NJ schools, by nearly any outcome measure stack up quite favorably against nearly any other state. So, they had to back off that rhetoric, and move to the achievement gap thing. Here’s one of the justifying statements in the current report.

“Likewise, on the 2011 administration of the National Assessment of Educational Progress, New Jersey ranked 50th out of 51 states (including Washington, D.C.) in the size of the achievement gap between high- and low-income students in eighth grade reading.”

Of course, as I’ve pointed out again and again, and will reiterate below, this is an entirely bogus comparison.

The Proposed Solution?

Like the previous funding report from last Winter, the primary recommendations in this new manifesto are to reduce funding adjustments for low income and non-English speaking kids, because we know they don’t need that funding and certainly couldn’t and obviously haven’t used it well. The report did back off from proposing one of the oldest tricks in the book for cutting aid to the poor – funding on average daily attendance – but likely backed off because they simply lack the legal authority to propose this change in this context and not out of any moral/ethical principle.

The Rationale?

The most bizarre section of the new report appears on the bottom of the second page. Here, the report’s author makes several bold, outlandish and unjustified and mostly factually incorrect statements. Further, little or no justification is provided for any of the boldly stated points. It’s nearly as ridiculous as The Cartel.

Here are two of my favorite paragraphs:  

 The conclusion is inescapable: forty years and tens of billions of dollars later, New Jersey’s economically disadvantaged students continue to struggle mightily. There are undoubtedly many reasons for this policy failure, but chief among them is the historically dubious view that all we need to do is design an education funding formula that would “dollarize” a “thorough and efficient system of free public school” and educational achievement for every New Jersey student would, automatically and without more, follow.” (emphasis added)

“Of course, schools must have the resources to succeed. To the great detriment of our students, however, we have twisted these unarguable truths into the wrongheaded notion that dollars alone equal success. How well education funds are spent matters every bit as much, and probably more so, than how much is spent. New Jersey has spent billions of dollars in the former-Abbott districts only to see those districts continue to fail large portions of their students. Until we as a state are willing to look beyond the narrow confines of the existing funding formula – tinkering here, updating there – we risk living Albert Einstein’s now infamous definition of insanity: doing the same thing over and over again and expecting a different result.”

First, I would point out that starting with the line “the conclusion is inescapable” is one of the first red flags that most of what follows will be a load of BS. But that aside… let’s take a look at some of these other statements.  I’m not sure who the Commissioner thinks is advancing the “historically dubious view that all we need…blah…blah… blah… dollarize … blah… blah” but I would point out that the central issue here is that a well organized, appropriately distributed, sufficiently funded state school finance system provides the necessary underlying condition for getting the job done – achieving the desired standards, etc. (besides nothing could ever equal the reformy dubiousness of this graph… or these!) .

This isn’t about arguing that money in and of itself solves all ills. But money is clearly required. It’s a prerequisite condition. More on that below. This claim that others are advancing such an historical dubious view is absurd. Nor is it the basis for the current state school finance system, or the court order that led to the previous (not current) system! [background on current system here]

Equally ridiculous is the phrase about these “unarguable truths.” Again, when I see a phrase like this, my BS detector nearly explodes. Again, I’m not sure who the commissioner thinks is advancing some “wrongheaded notion” that “dollars alone equal success,” but I assure you that while dollars alone don’t equal success, equitable and adequate resources are a necessary underlying condition for success.

Indeed, the current state school finance system is built on attempts to discern the dollars needed to provide the necessary programs and services to meet the state outcome objectives [I’ll set aside the junk comparisons to Common Core costs listed in the report for now]. But the focus isn’t/wasn’t on the dollars, but rather the programs and services – which, yes… ultimately do have to be paid for with… uh… dollars.

Under the prior Abbott litigation and resulting funding distributions, the focus was entirely on the specific programs and services required for improving outcomes of children in low income communities (early childhood education programs, adequate facilities, etc.). In fact, that was one of the persistent concerns among Abbott opponents… that the programs/services must be provided under the court mandate, regardless of their cost (not that the dollars must be provided regardless of their use) and in place of any broader, more predictable systematic formula. So, perhaps the answer is to go back to the Abbott model?

Ultimately, to establish a state school finance formula (which is a formula for distributing aid), you’ve got to “dollarize” this stuff. But that doesn’t by any stretch of the imagination lead to the assumption that the dollars create – directly – regardless of use – the outcomes. That’s just ridiculous. And the report provides no justification behind its attack on this mythical claim.

In fact, these statements convey a profound ignorance of even the recent history of school finance in New Jersey.

The Reality!

Now that I’m done with that, let’s correct the record on a few points.

New Jersey has an “average” achievement gap given its income gap

I’m not sure how many times I’ll have to correct the current NJDOE and its commissioner on their repeated misrepresentation of NAEP achievement gap data. This is getting old and it’s certainly indicative that the current administration is unconcerned with presenting any remotely valid information on the state of New Jersey schools. Given what we’ve seen in previous presentations I guess I shouldn’t be surprised.

In any case, here’s my most recent run of the data comparing income gaps and NAEP outcome gaps. Across the horizontal axis in this graph is the difference in income between those above the reduced lunch income threshold and those below the free lunch income threshold. New Jersey and Connecticut have among the largest gaps in income between these two groups. Keep in mind that the same income thresholds are used across all states, despite the fact that the cost of comparable quality of life varies quite substantially (nifty calculator here). On the vertical axis are the gaps in NAEP scores between the two groups.

 Figure 1. Income Gaps and Achievement Gaps


As we can see, states with larger gaps in income between the groups also have larger gaps in scores between the two groups. Quite honestly, this is not astounding. It’s dumb logic. And that’s why it’s so inexcusable for Cerf & Co. to keep returning to this intellectually & analytically dry well.

Most importantly, NJ’s gap is right on the line. That is, given its income gap, NJ falls right where we would expect- on the line. NJ’s income related achievement gap is right in line with expectations!

Is that good enough? Well, not really. There’s still work to be done. But the bogus claim that NJ has the 2nd largest achievement gap has to stop.

New Jersey has posted impressive NAEP gains given its spending increases

Now let’s take a look at how disadvantaged kids in NJ have actually done on a few of the NAEP tests in recent years when compared to disadvantaged kids in similar states in the region.  The pictures pretty much tell the story.

Figure 2. NAEP 8th grade Math for Children Qualified for Free Lunch


Figure 3. NAEP 4th grade Reading for Children Qualified for Free Lunch


Figure 4. NAEP 8th Grade Math for Children of Maternal HS Dropouts


Even Eric Hanushek’s recent data make NJ look pretty darn good in terms of NAEP gains achieved relatively to additional resources provided!

Figure 5. Relationship between Change in Per Pupil Spending and Overall NAEP Gain


Figure 6. Relationship between Change in % Spending per Pupil and Overall NAEP Gain


Figure 7. Relationship between Starting Point and Gain over Time


For more on these last few slides and the data from which they are generated, see this post.

Arguably, given these results, doing the same thing over and over again and expecting the SAME result might be entirely rational!

Money Matters & Equitable and Adequate Funding is a Necessary Underlying Condition for Success

Finally, a substantial body of literature exists to refute the absurd rhetoric and policy preferences of the NJDOE school funding report – most specifically the veiled assertion that reducing funding to low income children is the way to reduce the achievement gap.

In a recent report titled Revisiting the Age Old Question: Does Money Matter in Education? I review the controversy over whether, how and why money matters in education, evaluating the current political rhetoric in light of decades of empirical research.  I ask three questions, and summarize the response to those questions as follows:

Does money matter? Yes. On average, aggregate measures of per pupil spending are positively associated with improved or higher student outcomes. In some studies, the size of this effect is larger than in others and, in some cases, additional funding appears to matter more for some students than others. Clearly, there are other factors that may moderate the influence of funding on student outcomes, such as how that money is spent – in other words, money must be spent wisely to yield benefits. But, on balance, in direct tests of the relationship between financial resources and student outcomes, money matters.

Do schooling resources that cost money matter? Yes. Schooling resources which cost money, including class size reduction or higher teacher salaries, are positively associated with student outcomes. Again, in some cases, those effects are larger than others and there is also variation by student population and other contextual variables. On the whole, however, the things that cost money benefit students, and there is scarce evidence that there are more cost-effective alternatives.

Do state school finance reforms matter? Yes. Sustained improvements to the level and distribution of funding across local public school districts can lead to improvements in the level and distribution of student outcomes. While money alone may not be the answer, more equitable and adequate allocation of financial inputs to schooling provide a necessary underlying condition for improving the equity and adequacy of outcomes. The available evidence suggests that appropriate combinations of more adequate funding with more accountability for its use may be most promising.

While there may in fact be better and more efficient ways to leverage the education dollar toward improved student outcomes, we do know the following:

Many of the ways in which schools currently spend money do improve student outcomes.

When schools have more money, they have greater opportunity to spend productively. When they don’t, they can’t.

Arguments that across-the-board budget cuts will not hurt outcomes are completely unfounded.

In short, money matters, resources that cost money matter and more equitable distribution of school funding can improve outcomes. Policymakers would be well-advised to rely on high-quality research to guide the critical choices they make regarding school finance.

Regarding the politicized rhetoric around money and schools, which has become only more bombastic and less accurate in recent years, I explain the following:

Given the preponderance of evidence that resources do matter and that state school finance reforms can effect changes in student outcomes, it seems somewhat surprising that not only has doubt persisted, but the rhetoric of doubt seems to have escalated. In many cases, there is no longer just doubt, but rather direct assertions that: schools can do more than they are currently doing with less than they presently spend; the suggestion that money is not a necessary underlying condition for school improvement; and, in the most extreme cases, that cuts to funding might actually stimulate improvements that past funding increases have failed to accomplish.

To be blunt, money does matter. Schools and districts with more money clearly have greater ability to provide higher-quality, broader, and deeper educational opportunities to the children they serve. Furthermore, in the absence of money, or in the aftermath of deep cuts to existing funding, schools are unable to do many of the things they need to do in order to maintain quality educational opportunities. Without funding, efficiency tradeoffs and innovations being broadly endorsed are suspect. One cannot tradeoff spending money on class size reductions against increasing teacher salaries to improve teacher quality if funding is not there for either – if class sizes are already large and teacher salaries non-competitive. While these are not the conditions faced by all districts, they are faced by many.

It is certainly reasonable to acknowledge that money, by itself, is not a comprehensive solution for improving school quality. Clearly, money can be spent poorly and have limited influence on school quality. Or, money can be spent well and have substantive positive influence. But money that’s not there can’t do either. The available evidence leaves little doubt: Sufficient financial resources are a necessary underlying condition for providing quality education.

There certainly exists no evidence that equitable and adequate outcomes are more easily attainable where funding is neither equitable nor adequate. There exists no evidence that more adequate outcomes will be attained with less adequate funding. Both of these contentions are unfounded and quite honestly, completely absurd.

Related sources:

Baker, B.D. (2012) Revisiting the Age Old Question: Does Money Matter in Education. Shanker Institute.

Baker, B.D., Welner, K. (2011) School Finance and Courts: Does Reform Matter, and How Can We Tell? Teachers College Record 113 (11) p. –