Kevin Carey from Ed Sector has done it again. He’s come up with yet another argument that fails to pass even the most basic smell test. A few weeks ago, I picked on Kevin for making the argument that while charter schools, on average, are average, really good charter schools are better than average. Or, as he himself phrased it:
reasonable people acknowledge that the best charter schools–let’s call them “high-quality” charter schools–are really good
I myself am reasonable on occasion and fully accept this premise. Some schools are really good, and some not so good. And that applies to charter schools and non-charters alike, as I show in my recent post Searching for Superguy.
Well, last week Kevin Carey did it again – made a claim that simply doesn’t even pass the most basic smell test. In the New York Times Room for Debate series on value-added measurement of teachers, Carey argued that Value-added measures would protect teachers from favoritism. Principals would no-longer be able to go after certain teachers based on their own personal biases. Teachers would be able to back up their “real” performance with hard data. Here’s a quote:
“Value-added analysis can protect teachers from favoritism by using hard numbers and allow those with unorthodox methods to prove their worth.” (Kevin Carey, here)
The reality is that value-added measures simply create new opportunities to manipulate teacher evaluations through favoritism. In fact, it might even be easier to get a teacher fired by making sure the teacher has a weak value-added scorecard. Because value-added estimates are sensitive to non-random assignment of students, principals can easily manipulate the distributions of disruptive students, students with special needs, students with weak prior growth and other factors, which, if not fully accounted for by the VA model will bias teacher ratings. And some factors – like disruptive students, or those who simply don’t give a $#*! won’t (and can’t) be addressed in the VA models. That is, a clever principal can use the VA non-random assignment bias to create a statistical illusion that a teacher is a bad teacher. One might argue that some principals likely already engage in a practice of assigning more “difficult” students to certain teachers – those less favored by the principal. So, even if the principal is less clever and merely spiteful, the same effect can occur.
I wrote in an earlier post about the types of contractual protections teachers should argue for, in order to protect against such practices:
The language in the class size/random assignment clause will have to be pretty precise to guarantee that each teacher is treated fairly – in a purely statistical sense. Teachers should negotiate for a system that guarantees “comparable class size across teachers – not to deviate more than X” and that year to year student assignment to classes should be managed through a “stratified randomized lottery system with independent auditors to oversee that system.” Stratified by disability classification, poverty status, language proficiency, neighborhood context, number of books in each child’s home setting, etc. That is, each class must be equally balanced with a randomly (lottery) selected set of children by each relevant classification.
This may all sound absurd, but sadly, under policies requiring high stakes decisions such as dismissal to be based on value added measures, this stuff will likely become necessary. And, it will severely constrain principals who wish to work closely with teachers on making thoughtful, individualized classroom assignments for students. I address the new incentives of teachers to avoid taking on the “tough” cases in this post: http://schoolfinance101.wordpress.com/2010/09/01/kids-who-don%E2%80%99t-give-a-sht/
Technical follow-up: I noticed that Kevin Carey claims that VA measures “level the playing field for teachers who are assigned students of different ability.” This statement, as a general conclusion, is wrong.
a) VA measures do account for the initial performance level of individual students, or they would not be VA measures. Even this becomes problematic when measures are annual rather than fall/spring, so that summer learning loss is included in the year to year gain. An even more thorough approach for reducing model bias is to have multiple years of lagged scores on each child in order to estimate the extent to which a teacher can change a child’s trajectory (growth curve). That makes it more difficult to evaluate 3rd or 4th grade teachers, where many lagged scores aren’t yet available. The LAT model may have had multiple years of data on each teacher, but didn’t have multiple lagged scores on each child. All that the LAT approach does is to generate a more stable measure for a teacher, even if it is merely a stable measure of the bias of which students that teacher typically has assigned to him/her.
b) VA measures might crudely account for socio-economic status, disability status or language proficiency status, which may also affect learning gains. But, typical VA models, like the LA Times model by Buddin tend to use relatively crude, dichotomous proxies/indicators for these things. They don’t effectively capture the range of differences among kids. They don’t capture numerous potentially important, unmeasured differences. Nor do they typically capture classroom composition – peer group – effect which has been shown to be significant in many studies, whether measured by racial/ethnic/socioeconomic composition of the peer group or by average performance of the peer group.
c) For students who have more than one teacher across subjects (and/or teaching aides/assistants), each teacher’s VA measures may be influenced by the other teachers serving the same students.
I could go on, but recommend revisiting my previous posts on the topic where I have already addressed most of these concerns.