Pondering Legal Implications of Value-Added Teacher Evaluation


I’m going out on a limb here. I’m a finance guy. Not a lawyer. But, I do have a reasonable background on school law thanks to colleagues in the field like Mickey Imber at U. of Kansas and my frequent coauthor Preston Green at Penn State. That said, any screw ups in my legal analysis below are my own and not attributable to either Preston or Mickey. In any case, I’ve been wondering about the validity of the claim that some pundits seem to be making that these new teacher evaluation policies are going to make it easier and less expensive to dismiss teachers.

=====

A handful of states have now adopted legislation which mandates that teacher evaluation be linked to student test data. Specifically, legislation adopted in states like Colorado, Louisiana and Kentucky and legislation vetoed in Florida follow a template of requiring that teacher evaluation for pay increase, for retaining tenure and ultimately for dismissal must be based 50% or 51% on student “value-added” or “growth” test scores alone. That is, student test score data could make or break a salary increase decision, but could also make or break a teacher’s ability to retain tenure. Pundits backing these policies often highlight provisions for multi-year data tracking on teachers so that a teacher would not lose tenure status until he/she shows poor student growth for 2 or 3 years running. These provisions are supposed to eliminate the possibility that random error or a “bad crop of students” alone could determine a teacher’s future.

Pundits are taking the position that these new evaluation criteria will make it easier to dismiss teachers and will reduce the costs of dismissing a teacher that result from litigation. Oh, how foolish!

The way I see it, this new crop of state statutes and regulations which include arbitrary use of questionable data, applied in a questionably appropriate way will most likely lead to a flood of litigation like none that has ever been witnessed.

Why would that be? How can a teacher possibly sue the school district for being fired because he/she was a bad teacher? Simply writing into state statute or department regulations that one’s “property interest” to tenure and continued employment must be primarily tied to student test scores does not by any stretch of the legal imagination guarantee that dismissal based on student test scores will stand up to legal challenges – good and legitimate legal challenges.

There are (at least) two very likely legal challenges that will occur once we start to experience our first rounds of teacher dismissal based on student assessment data.

Due Process Challenges

Removing a teacher’s tenure status is denial of a teacher’s property interest and doing so requires “due process.” That’s not an insurmountable barrier, even under typical teacher contracts that don’t require dismissal based on student test scores. Simply declaring that “a teacher will be fired if he/she shows 2 straight years of bad student test scores (growth or value-added)” and then firing a teacher for as much does not mean that the teacher necessarily was provided due process. Under a policy requiring that 51% of the employment decision be based on student value added test scores, a teacher could be wrongly terminated due to:

a) Temporal instability of the value-added measures

http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf

Ooooh…Temporal instability… what’s that supposed to mean? What it means is that teacher value-added ratings, which are averages of individual student gains, tend not to be that stable over time. The same teacher is highly likely to get a totally different value added rating from one year to the next. The above link points to a policy brief which explains that the year to year correlation for a teacher’s value added rating is only about .2 or .3. Further, most of the change or difference in the teacher’s value added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics. 87.5% (elementary math) to 70% (8th grade math) noise! While some statistical corrections and multi-year measures might help, it’s hard to guarantee or even be reasonably sure that a teacher wouldn’t be dismissed simply as a function of unexplainable low performance for 2 or 3 years in a row. That is, simply due to noise, and not the more troublesome issue of how students are clustered across schools, districts and classrooms.

b) Non-random assignment of students

The only fair way to compare teachers’ ability to produce student value-added is to randomly assign all students, statewide to all teachers… and then of course, to have all students live in exactly comparable settings with exactly comparable support structures outside of school, etc., etc. etc. That’s right. We’d have to send all of our teachers and all of our students to a single boarding school location somewhere in the state and make sure, absolutely sure that we randomly assigned students, the same number of students to each and every teacher in the system.

Obviously, that’s not going to happen. Students are not randomly sorted and the fact that they are not has serious consequences for comparing teachers’ ability to produce student value-added. See: http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf

c) Student manipulation of test results

As she travels the nation on her book tour, Diane Ravitch raises another possibility for how a teacher might find him/herself out of a job by no real fault of actual bad teaching. As she puts it, this approach to teacher evaluation puts the teacher’s job directly in the students’ hands. And the students can, if they wish, choose to consciously abuse that responsibility.  That is, the students could actually choose to bomb the state assessments to get a teacher fired, whether it’s a good teacher or a bad one. This would most certainly raise due process concerns.

d) A whole bunch of other uncontrollable stuff

A recent National Academies report noted:

“A student’s scores may be affected by many factors other than a teacher — his or her motivation, for example, or the amount of parental support — and value-added techniques have not yet found a good way to account for these other elements.”

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=1278

This report generally urged caution regarding overemphasis of student value-added test scores in teacher evaluation – especially in high stakes decisions. Surely, if I was an expert witness testifying on behalf of a teacher who had been wrongly dismissed, I’d be pointing out that the National Academies said that using the student assessment data in this way is not a good idea.

Title VII of the Civil Rights Act Challenges

The non-random assignment of students leads to the second likely legal claim that will flood the courts as student testing based teacher dismissals begin – Claims of racially disparate teacher dismissal under Title VII of the Civil Rights Act of 1964.  Given that students are not randomly assigned and that poor and minority – specifically black – students are densely clustered in certain schools and districts and that black teachers are much more likely to be working in schools with classrooms of low-income black students, it is highly likely that teacher dismissals will occur in a racially disparate pattern. Black teachers of low-income black students will be several times more likely to be dismissed on the basis of poor value-added test scores. This is especially true where a statewide fixed, rigid requirement is adopted and where a teacher must be de-tenured and/or dismissed if he/she shows value-added below some fixed value-added threshold on state assessments.

So, here’s how this one plays out. For every 1 white teacher dismissed on value-added basis, 10 or more black teachers are dismissed –  relative to the overall proportions of black and white teachers. This gives the black teachers the argument that the policy has racially disparate effect. No, it doesn’t end there. A policy doesn’t violate Title VII merely because it has racially disparate effect. That just starts the ball rolling – gets the argument into court.

The state gets to defend itself – by claiming that producing value-added test scores is a legitimate part of a teacher’s job and then explaining how the use of those scores is, in fact neutral with respect to race. It just happens to have the disparate effect. Right? But, as the state would argue, that’s a good thing because it ensures that we can put better teachers in front of these poor minority kids, and get rid of the bad ones.

But, the problem is that the significant body of research on non-random assignment of students and its effect of value added scores indicates that it’s not necessarily differences in the actual effectiveness of black versus white teachers, but that the black teachers are concentrated in the poor black schools and that student clustering and not teacher effectiveness is leading to the disparate rates of teacher dismissal.  So they weren’t fired because they were precisely measurably ineffective, they were fired because they had classrooms of poor minority students year after year? At the very least, it is statistically problematic to distill one effect from the other! As a result, it’s statistically problematic to argue that the teacher should be dismissed! There is at least equal likelihood that the teacher is wrongly dismissed as there is that the teacher is rightly dismissed. I suspect a court might be concerned by this.

Reduction in Force

Note that many of these same concerns apply to all of the recent rhetoric over teacher layoffs and the need to base those layoffs on effectiveness rather than seniority. It all sounds good, until you actually try to go into a school district of any size and identify the 100 “least effective” teachers given the current state of data for teacher evaluation. Simply writing into a reduction in force (RIF) policy a requirement of dismissal based on “effectiveness” does not instantly validate the “effectiveness” measures. And even the best “effectiveness” measures, as discussed above, remain really problematic, providing tenured teachers reduced on grounds of ineffectiveness multiple options for legal action.

Additional Concerns

These two legal arguments ignore the fact that school districts and states will have to establish two separate types of contracts for teachers to begin with, since even in the best of statistical cases, only about 1/5 of teachers (those directly responsible for teaching math or reading in grades three through eight) might possibly be evaluated via student test scores (see: https://schoolfinance101.wordpress.com/2009/12/04/pondering-the-usefulness-of-value-added-assessment-of-teachers/)

I’ve written previously about the technical concerns over value-added assessment of teachers and my concern that pundits are seemingly completely ignorant of the statistical issues. I’m also baffled that few others in the current policy discussion seem even remotely aware of just how few teachers might – in the best possible case – be evaluated via student test scores, and the need for separate contracts. But, I am perhaps most perplexed that no-one seems to be acknowledging the massive legal mess likely to ensue when (or if) these poorly conceived policies are put into action.

I’ll save for another day the discussion of just who will be waiting in line to fill those teaching vacancies created by rigid use of test scores for disproportionately dismissing teachers in poor urban schools. Will they, on average, be better or perhaps worse than those displaced before them? Just who will wait in this line to be unfairly judged?

For a related article on the use of certification exams for credentialing teachers, see:

Green, P.C., Sireci, S.G. (2005) Legal and Psychometric Criteria for Evaluating Teacher Certification Tests.  Educational Measurement: Issues and Practice. Volume 19 Issue 1, Pages 22 – 31

Advertisements

13 Comments

  1. GREAT post Bruce! This is an aspect of teacher evaluation reform that gets NO press. The more you peel away at some of these ideas, the more they stink!

  2. I left this response on my own blog as well: http://bit.ly/d2Xwd9

    Bruce,

    You are exactly on point. Both are legitimate legal problems, with the disparate impact being more of the slam dunk in my opinion. The disparate impact numbers would be off the charts and states would have a very difficult time establishing that it is a neutral policy. You start tying that in with school finance stats and other characteristics like age of school buildings and the picture is going to get very dark, very fast (no pun intended). This would have to be a disparate impact case, though, not a disparate treatment case (http://bit.ly/GpQJp) and since it is disparate impact, a class would likely be formed — i.e. someone would need to invest money on the front end of this case to organize it – thus, the NAACP or some kind of organization like that would probably get involved.

    The Due Process argument would be a harder (and much less profile) case, but it could be brought individually … so we might start seeing a whole lot more of these. Your identified problems in the statistics I think are great, but you are one of the best statistical minds in the education field. Your average joe-blow lawyer would have a really tough time making that case. And, as long as these cases stayed at the district court level, that case would have to be made over and over and over again by lawyers in each distinct community within each distinct state. If those cases rose to the level of the Circuit courts or the Supreme Court, that would save lawyers some work, but it would still be a costly case to put together and perhaps not worth it to the teachers. Anyway, I think you are right on with the legal analysis, but I think a lot of things in the US don’t make statistical sense, but the legal system is just not competent enough to always tease that out.

    Another legal problem this would create is that if teacher evals were 50 or 51 percent based on test score improvements … it would make it even more difficult legally to get rid of bad teachers whose student test scores happened to go up. You can put a bad teacher in front of an AP class, and those kids are still going to excel on the test. If that bad teacher has a bad personality, treats parents badly, or any other negative qualitative component for which she would otherwise be dismissed or non-renewed, the test score based evaluation just gave that teacher a silver bullet in court. Probably like your law person there at Rutgers, I teach my principals to not give a reason to pre-tenure teachers when RIFing, because if you give a reason, then you have to defend it in court. These polices not only give a reason, but they give a reason that is largely outside of the principal’s control. Even if it winds up that courts still think that 40% negative qualitative evaluation is enough to still RIF or dismiss a teacher, the number of lawsuits is likely to go up dramatically.

    Generally, all this is what happens when you start forcing statistics in the legal system – which is not built for that at all. The legal system is a very qualitatively oriented system, making decisions mostly based on evidence obtained through interviews and the like. The jury, even, is a qualitative system that collectively makes a decision based on all the evidence presented. Statistics throw a wrench in all that because people react differently to numbers. They think numbers don’t lie (although, of course, we know that they can and do). That’s why generally, I don’t love policies that seek to make decisions based solely on numbers – these kinds of things are the result.

  3. Marvelous post! Why do law makers and policy makers ignore these concerns? I’ve tried raising the same issues myself in Teacher Magazine and on my group blog,
    http://accomplishedcaliforniateachers.wordpress.com

    Those who persist in making the argument are engaging in the worst kind of wishful thinking. Test scores reflect teaching… it sounds so simple, logical, and appealing, they figure if they just keep repeating it they’ll win the argument. Then there’s the other popular tacitc, accusing others of embracing the status quo and evading accountability.

  4. Excellent post; every federal and state lawmaker, federal and state education official, along with superintendents, school boards and anyone else in the least bit involved with public education should be handed/emailed/faxed/tweeted/whatevered a copy and strongly encouraged to read it.

    Although it is most likely possible to evaluate regular classroom teachers (although it’s much harder to see for special ed, phys ed, librarians, art teachers, Title 1 teachers, Literacy Specialists, Math Interventionists, etc) at least in part on student achievement, the reality at the moment is that there is no consensus as to which tests should be used or how. In defending a teacher who was being fired, I used NeCAP test results to show success; the school system used NWEA to demonstrate relative failure. We were both right and both wrong; should the teacher have lost the job based on conflicting data? No. But it happened and the teacher chose not to fight, thereby saving the school system significant legal and other fees. They may not be so lucky the next time.

  5. Of course there are currently many faculty teaching classes that do not currently have standardized tests for their courses. The committee in TN charged with developing guidelines and criteria for the annual evaluation of all teachers and
    principals employed by LEAs is, among other ideas, considering inviting the teachers associations of those different areas to suggest ways to measure value added growth. E.g., the band directors developing a way to measure student progress over the course of a year.

    Another idea being considered is breaking down how the band or art students did on their math tests compared to non-band or non-art students (to potentially bolster the arguments in support of those currently non-core courses), and rewarding the band or art teachers, in part, based on how their students performed on those tests.

    Of course, if we had universal vouchers, students and parents could empty the classrooms of incompetent teachers and LEAs would simply have to conduct RIFs.

    In the meantime, I think the difficulty of the task and the potential costs of evaluating teachers just as coaches and fans evaluate professional athletes (based mostly on performance–though, I’ll admit, my evaluation of Barry Bonds was based mostly on his apparently poor attitude) should not keep us from the attempt. Our kids are worth it.

  6. Great Posting – and something that teachers seem to already know, but everyone outside of the “inside” can’t seem to get their heads around. I think it would be important, however, to point out that it’s not necessarily a problem with value-added assessments and modeling, per se, but rather the implications (legal and otherwise) of placing reliance on the results of these assessments in areas they do not provide valuable information. In terms of tracking student growth, they can be useful. And if appropriately analyzed, can provide valuable insight into the “effectiveness” of particular teachers or programs. It is the reliance on these inferences for employment decisions that is really the problem here, not their use altogether. I wish I had a solution, but obviously don’t.

  7. Great post – yet, like others have mentioned, frustrating in that to anyone who’s spent a modicum of time looking at the issue should see how fraught with problems this is. But I think the cognitive bias of “wishful thinking” comes in strongly here. They want the achievement gap erased, and yet don’t want to do the heavy lifting of the paradigm shift in resource allocation any true solutions might entail.
    Cognitive dissonance + easy answers = willful ignorance.

    My worry is that going down this road does 2 things: 1) it takes us further away from real solutions and 2) by rewarding performance-via-standardized testing it further incentivizes the teaching market towards the easiest to teach – which are the *least* needy children to begin with.

  8. Bruce, Thank you for this excellent analysis. These policies produce perverse consequences from an educational point of view. The more that basic skills tests count and the higher the stakes attached to them, the more they incentivize cheating, gaming the system, narrowing the curriculum, and teaching to the tests. To get value-added growth models, the amount of testing will have to double, so that students are tested in September and again in May or June. More time for testing and “interim assessments,” less time for instruction. Thus, the states that adopt these policies (hoping to win Race to the Top funding) will see less time devoted to the teaching of history, the arts, geography, foreign languages, science, and other subjects that “don’t count.” These policies may or may not lift test scores. Most assuredly, they will not produce good education. Diane Ravitch

    1. I could not agree more. One of my primary concerns all along has been not only the curricular narrowing, but the fact that the curricular narrowing is disparately distributed because accountability pressures are disparately distributed. Only some schools, serving some children are forced to narrow their curriculum substantially.

      Aside from the teacher evaluation issue, a handful of self-proclaimed school finance experts have begun to argue that schools serving poor and minority children should be forced to re-allocate resources – any and all resources – toward improving test scores. In their eyes (Marguerite Roza in particular), kids in low performing high poverty schools should not be wasting their time – and schools not wasting their money – on trivial stuff like ceramics or cheerleading (her examples, not mine). That other children in nearby affluent districts not facing accountability pressures have these resources is of no consequence in Roza’s view. The reality is that it goes much deeper than cheerleading and ceramics, and into the breadth of advanced foreign language offerings and math/literature and other social science electives at the high school level. Unfortunately, think tanks like Center for American Progress and Ed Trust have bought this garbage wholesale… likely because it provides them the politically convenient argument the poor urban schools can be fixed without any new money (Just like their view on the teacher quality/evaluation stuff). It also allows them to point the finger at district leaders rather than state officials for the way schools are funded.

      But I digress.

      Thanks for your comments. We had a fun continued legal discussion on this topic over at http://www.edjurist.com.

  9. Black teachers in Chicago with the help of CORE (the new faction of the CTU that recently won control of the union ) filed a complaint to the EEOC on the issue of racially discriminatory firings … U.S. Department of Justice will decide whether it will sue the district for discrimination. If not, the teachers can bring a lawsuit of their own.

    Hiring patterns in NYC considerably more skewed toward white recruits in NYC since mayoral control imposed; Deputy mayor Walcott says they are looking to improve “quality.” The rise of TFA is also implicated in this trend; even without bogus value-added evaluations this is a problem that has become far more widespread in recent years. It will be interesting to see if Obama’s Justice dept. takes an independent view of this from the privateers ensconced at his Ed Dept.

Comments are closed.