Last week, this video from The Onion (asking whether tests are biased against kids who don’t give a sh^%t) was going viral among the education social networking geeks like me. At the same time, the conversations continued on the Los Angeles Times Value-Added story, with LAT releasing the scores for individual teachers.
I’ve written many blog posts in recent weeks on this topic. Lately, it seems that the emphasis on the conversation has turned toward finding a middle ground – discussing the appropriate role for VAM (Value Added Modeling) – if any, in teacher evaluation. But also, there is renewed rhetoric defending VAM. Most of that rhetoric seems to take on most directly the concern over the error rates in VAM – and lack of strong year to year correlation between which teachers are rated high or low.
The new rhetoric points out that we’re only having this conversation about VAM error rates because we can measure the error rate in VAM, but can’t even do that for peer or supervisor evaluation – which might be much worse (argue the pundits). The new rhetoric argues that VAM is still the “best available” method for evaluating teacher “performance.” Let me point out that if the “best available” automobile burst into flames on every fifth start, I think I’d walk or stay home instead. I’d take pretty significant steps to avoid driving. Now, we’re not talking about death by VAM here, but the idea that random error alone – under an inflexible VAM based policy structure – could lead to wrongfully firing a teacher is pretty significant.
Again, this current discussion pertains only to the “error rate” issue. Other major – perhaps even bigger issues include the problem that so few teachers could even have test scores attached to them - creating a whole separate sub-class (<20%) of teachers in each school system and increasing divisions among teachers – creating significant tension, for example between teachers under the VAM (math/reading) rating system, and teachers who might want to meet with some of their students for music, art or other enrichment endeavors.
Perhaps most significantly, there still exists that pesky little problem of VAM not being able to sufficiently account for the non-random sorting of students across schools and teachers. For those who wish to use Kane and Staiger as their out on this (without reference to broader research on this topic), see my previous post on the LAT analysis. Their findings are interesting, but not the single definitive source on this issue. Note also that the LAT analysis itself reveals some bias likely associated with non-random assignment (the topic of my post).
So then, what the heck does this have to do with The Onion video about testing and kids who don’t give a sh^%t?
I would argue that the non-random assignment of kids who don’t give a sh^%t presents a significant concern for VAM. Consider any typical upper elementary school. It is quite possible that kids who don’t give a sh^%t are more likely to be assigned to one fourth grade teacher year-after-year than to another. This may occur because that fourth grade teacher really wants to try to help these kids out, and has some, though limited success in doing so. This may also occur because the principal has it in for one teacher – and really wants to make his/her life difficult. Or, it may occur because all of the parents of kids who do give a sh^%t (in part because their parents give a sh^%t) consistently request the same teacher year after year.
In all likelihood, whether the kids give a sh^%t about doing well – and specifically doing well on the tests used for generating VA estimates – matters, and may matter a lot. Teachers with disproportionate numbers of kids who don’t give a sh^%t may, as a result receive systematically lower VA scores, and if the sorting mechanisms above are in place, this may occur year after year.
What incentive does this provide for the teacher who wanted to help – to help kids give a sh^%t? Statistically, even if that teacher made some progress in overcoming the give a sh^%t factor, the teacher would get a low rating because give a sh^%t factor would not be accounted for in the model. Buddin’s LAT model includes dummy variables for kids who are low income and kids who are limited in their English language proficiency. But, there’s no readily available indicator for kids who don’t give a sh^%t. So we can’t effectively compare one teacher with 10 (of 25) kids who don’t give a sh^%t to another with 5 (of 25) who don’t give a sh^%t. We can hope that giving a sh^%t , or not, is picked up by the child’s prior year performance, and even better, by the prior multiple years of value-added estimates on that child. But, do we really know whether giving a sh^%t is a stable student characteristic over time? Many VAM models like the LAT one don’t capture multiple prior years of value-added for each student.
I noted in previous posts that peer-effect is among those factors that compromises (biases) teacher VAM ratings. Buddin’s LAT model, as far as I can tell, doesn’t try to capture differences in peer group when attempting to “isolate” teacher effect (though this is very difficult to accomplish). Unlike racial characteristics or child poverty, whether 1 or 10 kids in a class give a sh^%t might rub off on others in the class. Or, the disruptive behavior of kids who don’t give a sh^%t might significantly compromise the learning (and value-added estimates) of others. Yet, all of this goes unmeasured in even the best VAMs.
Once again, just pondering…
NEW: BONUS VIDEO