In my previous post I chastised state officials for their blatant mischaracterization of metrics to be employed in teacher evaluation. This raised (in twitter conversation) the issue of the frequent misrepresentation of findings from the Gates Foundation Measures of Effective Teaching Project (or MET). Policymakers frequently invoke the Gates MET findings as providing broad based support for however they might choose to use, whatever measures they might choose to use (such as growth percentiles).
Here is one example in a recent article from NJ Spotlight (John Mooney) regarding proposed teacher evaluation regulations in New Jersey:
New academic paper: One of the most outspoken critics has been Bruce Baker, a professor and researcher at Rutgers’ Graduate School of Education. He and two other researchers recently published a paper questioning the practice, titled “The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era.” It outlines the teacher evaluation systems being adopted nationwide and questions the use of SGP, specifically, saying the percentile measures is not designed to gauge teacher effectiveness and “thus have no place” in determining especially a teacher’s job fate.
The state’s response: The Christie administration cites its own research to back up its plans, the most favored being the recent Measures of Effective Teaching (MET) project funded by the Gates Foundation, which tracked 3,000 teachers over three years and found that student achievement measures in general are a critical component in determining a teacher’s effectiveness.
I asked colleague Morgan Polikoff of the University of Southern California for his comments. Note that Morgan and I aren’t entirely on the same page on the usefulness of even the best possible versions of teacher effect (on test score gain) measures… but we’re not that far apart either. It’s my impression that Morgan believes that better estimated measures can be more valuable – more valuable than I perhaps think they can be in policy decision making. My perspective is presented here (and Morgan is free to provide his). My skepticism in part arises from my perception that there is neither interest among or incentive for state policymakers to actually develop better measures (as evidenced in my previous post). And that I’m not sure some of the major issues can ever be resolved.
That aside, here are Morgan Polikoff’s comments regarding misrepresentation of the Gates MET findings – in particular, as applied to states adopting student growth percentile measures:
As a member of the Measures of Effective Teaching (MET) project research team, I was asked by Bruce to pen a response to the state’s use of MET to support its choice of student growth percentiles (SGPs) for teacher evaluations. Speaking on my behalf only (and not on behalf of the larger research team), I can say that the MET project says nothing at all about the use of SGPs. The growth measures used in the MET project were, in fact, based on value-added models (VAMs) (http://www.metproject.org/downloads/MET_Gathering_Feedback_Research_Paper.pdf). The MET project’s VAMs, unlike student growth percentiles, included an extensive list of student covariates, such as demographics, free/reduced-price lunch, English language learner, and special education status.
Extrapolating from these results and inferring that the same applies to SGPs is not an appropriate use of the available evidence. The MET results cannot speak to the differences between SGP and VAM measures, but there is both conceptual and empirical evidence that VAM measures that control for student background characteristics are more conceptually and empirically appropriate (link to your paper and to Cory Koedel’s AEFP paper). For instance, SGP models are likely to result in teachers teaching the most disadvantaged students being rated the poorest (cite Cory’s paper). This may result in all kinds of negative unintended consequences, such as teachers avoiding teaching these kinds of students.
In short, state policymakers should consider all of the available evidence on SGPs vs. VAMs, and they should not rely on MET to make arguments about measures that were not studied in that work.
Baker, B.D., Oluwole, J., Green, P.C. III (2013) The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5). This article is part of EPAA/AAPE’s Special Issue On Value-Added: What America’s Policymakers Need to Know and Understand, Guest Edited by Dr. Audrey Amrein-Beardsley and Assistant Editors Dr. Clarin Collins, Dr. Sarah Polasky, and Ed Sloat. Retrieved [date], from http://epaa.asu.edu/ojs/article/view/1298
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012). Selecting Growth Measures for School and Teacher Evaluations. http://ideas.repec.org/p/umc/wpaper/1210.html
(Updated alternate version: