An Update on New Jersey’s SGPs: Year 2 – Still not valid!

Posted on January 31, 2014



I have spent much time criticizing New Jersey’s Student Growth Percentile measures over the past few years, both conceptually and statistically. So why stop now.

We have been told over and over again by the Commissioner and his minions that New Jersey’s SGPs take fully into account student backgrounds by accounting for each student’s initial score and comparing students against others with similar starting point.  I have explained over and over again that just because individual student’s growth percentiles are estimated relative to others with similar starting points by no means validates that classroom median growth percentiles or school median growth percentiles are by any stretch of the imagination a non-biased measure of teacher or school quality.

The assumption is conceptually wrong and it is statistically false! New Jersey’s growth percentile measures are NOT a valid indicator of school or teacher quality [or even school or teacher effect on student test score change from time 1 to time 2], plain and simple. Adding a second year of data to the mix reinforces my previous conclusions.

Now that we have a second year of publicly available school aggregate growth percentile measures, we can ask a few very simple questions. Specifically, we can ask how stable, or how well correlated those school level SGPs are from one year to the next, across all the same schools?

I’ve explained previously, however, that stability of these measures over time may actually reflect more bad than good. It may simply be that the SGPs stay relatively stable from one year of the next because they are picking up factors such as the persistent influence of child poverty, effects of being clustered with higher or lower performing classmates/schoolmates, or that the underlying test scales simply allow for either higher or lower performing students to achieve greater gains.

That is, SPGs might be stable merely because of stable bias! If that is indeed the case, it would be particularly foolish to base significant policy determinations on these measures.

Let’s clarify this using the research terms “reliability” and “validity.”

  • Validity means that a measure measures what is intended to, which in this case, is that the measure is intended to capture the influence of schools and teachers on changes in student test scores  over time. That is, the measure is not simply capturing something else. Validity is presumed good, but only to the extent those choosing what to measure are making good choices.  One might, for example, choose to, and fully accomplish measurement of something totally useless (one can debate the value of measuring differences over time in reading and math scores as representative more broadly of teacher or school quality).
  • Reliability means that a measure is consistent over time, presumed to mean that it is consistently capturing something over time. Too many casual readers of research and users of these terms assume reliability is inherently good. That a reliable measure is always a good measure. That is not the case if the measure is reliable simply because it is consistently measuring the wrong thing. A measure can quite easily be reliably invalid.

So, let’s ask ourselves a few really simple empirical questions using last year’s and this year’s SGP data, and a few other easily accessible measures like average proficiency rates and school rates of children qualified for free lunch (low income).

  • How stable are NJ’s school level SGPs from year 1 to year 2?
  • If they are stable, or reasonably correlated, might it be because they are correlated to other stuff?
    • Average prior performance levels?
    • School level student population characteristics?

If we were seeking a non-biased and stable measure of school or teacher effectiveness, we would expect to find a high correlation from one year to the next on the SGPs, coupled with low correlations between those SGPs and other measures like prior average performance or low income concentrations.

By contrast, if we find relatively high year over year correlation for our SGPS but also find that the SGPS on average over the years are correlated with other stuff (average performance levels and low income concentrations), then it becomes far more likely that the stability we are seeing is “bad” stability (false signal or bias) rather than “good” stability (true signal of teacher or school quality).

That is, we are consistently mis-classifying schools (and by extension their teachers) as good or bad, simply because of the children they serve!

Well then, here’s the correlation matrix (scatterplots below):

Slide1

The bottom line is that New Jersey’s language arts SGPs are:

  • Nearly as strongly (when averaged over two years) correlated with concentrations of low income children as they are with themselves over time!
  • As strongly (when averaged over two years) correlated with prior average performance than they are with themselves over time!

Patterns are similar for math.  Year over year correlations for math (.61) are somewhat stronger than correlations between math SGPs and performance levels (.45 to .53) or low income concentration (-.38). But, correlations with performance levels and low income concentrations remain unacceptably high – signalling substantial bias.

The alternative explanation is to buy into the party line that what we are really seeing here is the distribution of teaching talent across New Jersey schools. Lower poverty schools simply have the better teachers. And thus, those teachers must have been produced by the better colleges/universities.

Therefore, we should build all future policies around these ever-so-logical, unquestionably valid findings. That the teachers in high poverty schools whose children had initially lower performance and thus systematically lower SGPs, must be fired and a new batch brought in to replace them. Heck, if the new batch of teachers is even average (like teachers in schools of average poverty and average prior scores), then they can lift those SGPs and average scores of high poverty below average schools toward the average.

At the same time, we must track down the colleges of education responsible for producing those teachers in high poverty schools who failed their students so miserably and we must impose strict sanctions on those colleges.

That’ll work, right? No perverse incentives here? Especially since we are so confident in the validity of these measures?

Nothing can go wrong with this plan, right?

A vote of no confidence is long overdue here!

Slide2

Slide3

Slide4Slide5Slide6Slide7Slide8Slide9