Pondering Legal Implications of Value-Added Teacher Evaluation

Posted on June 2, 2010



I’m going out on a limb here. I’m a finance guy. Not a lawyer. But, I do have a reasonable background on school law thanks to colleagues in the field like Mickey Imber at U. of Kansas and my frequent coauthor Preston Green at Penn State. That said, any screw ups in my legal analysis below are my own and not attributable to either Preston or Mickey. In any case, I’ve been wondering about the validity of the claim that some pundits seem to be making that these new teacher evaluation policies are going to make it easier and less expensive to dismiss teachers.

=====

A handful of states have now adopted legislation which mandates that teacher evaluation be linked to student test data. Specifically, legislation adopted in states like Colorado, Louisiana and Kentucky and legislation vetoed in Florida follow a template of requiring that teacher evaluation for pay increase, for retaining tenure and ultimately for dismissal must be based 50% or 51% on student “value-added” or “growth” test scores alone. That is, student test score data could make or break a salary increase decision, but could also make or break a teacher’s ability to retain tenure. Pundits backing these policies often highlight provisions for multi-year data tracking on teachers so that a teacher would not lose tenure status until he/she shows poor student growth for 2 or 3 years running. These provisions are supposed to eliminate the possibility that random error or a “bad crop of students” alone could determine a teacher’s future.

Pundits are taking the position that these new evaluation criteria will make it easier to dismiss teachers and will reduce the costs of dismissing a teacher that result from litigation. Oh, how foolish!

The way I see it, this new crop of state statutes and regulations which include arbitrary use of questionable data, applied in a questionably appropriate way will most likely lead to a flood of litigation like none that has ever been witnessed.

Why would that be? How can a teacher possibly sue the school district for being fired because he/she was a bad teacher? Simply writing into state statute or department regulations that one’s “property interest” to tenure and continued employment must be primarily tied to student test scores does not by any stretch of the legal imagination guarantee that dismissal based on student test scores will stand up to legal challenges – good and legitimate legal challenges.

There are (at least) two very likely legal challenges that will occur once we start to experience our first rounds of teacher dismissal based on student assessment data.

Due Process Challenges

Removing a teacher’s tenure status is denial of a teacher’s property interest and doing so requires “due process.” That’s not an insurmountable barrier, even under typical teacher contracts that don’t require dismissal based on student test scores. Simply declaring that “a teacher will be fired if he/she shows 2 straight years of bad student test scores (growth or value-added)” and then firing a teacher for as much does not mean that the teacher necessarily was provided due process. Under a policy requiring that 51% of the employment decision be based on student value added test scores, a teacher could be wrongly terminated due to:

a) Temporal instability of the value-added measures

http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf

Ooooh…Temporal instability… what’s that supposed to mean? What it means is that teacher value-added ratings, which are averages of individual student gains, tend not to be that stable over time. The same teacher is highly likely to get a totally different value added rating from one year to the next. The above link points to a policy brief which explains that the year to year correlation for a teacher’s value added rating is only about .2 or .3. Further, most of the change or difference in the teacher’s value added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics. 87.5% (elementary math) to 70% (8th grade math) noise! While some statistical corrections and multi-year measures might help, it’s hard to guarantee or even be reasonably sure that a teacher wouldn’t be dismissed simply as a function of unexplainable low performance for 2 or 3 years in a row. That is, simply due to noise, and not the more troublesome issue of how students are clustered across schools, districts and classrooms.

b) Non-random assignment of students

The only fair way to compare teachers’ ability to produce student value-added is to randomly assign all students, statewide to all teachers… and then of course, to have all students live in exactly comparable settings with exactly comparable support structures outside of school, etc., etc. etc. That’s right. We’d have to send all of our teachers and all of our students to a single boarding school location somewhere in the state and make sure, absolutely sure that we randomly assigned students, the same number of students to each and every teacher in the system.

Obviously, that’s not going to happen. Students are not randomly sorted and the fact that they are not has serious consequences for comparing teachers’ ability to produce student value-added. See: http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf

c) Student manipulation of test results

As she travels the nation on her book tour, Diane Ravitch raises another possibility for how a teacher might find him/herself out of a job by no real fault of actual bad teaching. As she puts it, this approach to teacher evaluation puts the teacher’s job directly in the students’ hands. And the students can, if they wish, choose to consciously abuse that responsibility.  That is, the students could actually choose to bomb the state assessments to get a teacher fired, whether it’s a good teacher or a bad one. This would most certainly raise due process concerns.

d) A whole bunch of other uncontrollable stuff

A recent National Academies report noted:

“A student’s scores may be affected by many factors other than a teacher — his or her motivation, for example, or the amount of parental support — and value-added techniques have not yet found a good way to account for these other elements.”

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=1278

This report generally urged caution regarding overemphasis of student value-added test scores in teacher evaluation – especially in high stakes decisions. Surely, if I was an expert witness testifying on behalf of a teacher who had been wrongly dismissed, I’d be pointing out that the National Academies said that using the student assessment data in this way is not a good idea.

Title VII of the Civil Rights Act Challenges

The non-random assignment of students leads to the second likely legal claim that will flood the courts as student testing based teacher dismissals begin – Claims of racially disparate teacher dismissal under Title VII of the Civil Rights Act of 1964.  Given that students are not randomly assigned and that poor and minority – specifically black – students are densely clustered in certain schools and districts and that black teachers are much more likely to be working in schools with classrooms of low-income black students, it is highly likely that teacher dismissals will occur in a racially disparate pattern. Black teachers of low-income black students will be several times more likely to be dismissed on the basis of poor value-added test scores. This is especially true where a statewide fixed, rigid requirement is adopted and where a teacher must be de-tenured and/or dismissed if he/she shows value-added below some fixed value-added threshold on state assessments.

So, here’s how this one plays out. For every 1 white teacher dismissed on value-added basis, 10 or more black teachers are dismissed –  relative to the overall proportions of black and white teachers. This gives the black teachers the argument that the policy has racially disparate effect. No, it doesn’t end there. A policy doesn’t violate Title VII merely because it has racially disparate effect. That just starts the ball rolling – gets the argument into court.

The state gets to defend itself – by claiming that producing value-added test scores is a legitimate part of a teacher’s job and then explaining how the use of those scores is, in fact neutral with respect to race. It just happens to have the disparate effect. Right? But, as the state would argue, that’s a good thing because it ensures that we can put better teachers in front of these poor minority kids, and get rid of the bad ones.

But, the problem is that the significant body of research on non-random assignment of students and its effect of value added scores indicates that it’s not necessarily differences in the actual effectiveness of black versus white teachers, but that the black teachers are concentrated in the poor black schools and that student clustering and not teacher effectiveness is leading to the disparate rates of teacher dismissal.  So they weren’t fired because they were precisely measurably ineffective, they were fired because they had classrooms of poor minority students year after year? At the very least, it is statistically problematic to distill one effect from the other! As a result, it’s statistically problematic to argue that the teacher should be dismissed! There is at least equal likelihood that the teacher is wrongly dismissed as there is that the teacher is rightly dismissed. I suspect a court might be concerned by this.

Reduction in Force

Note that many of these same concerns apply to all of the recent rhetoric over teacher layoffs and the need to base those layoffs on effectiveness rather than seniority. It all sounds good, until you actually try to go into a school district of any size and identify the 100 “least effective” teachers given the current state of data for teacher evaluation. Simply writing into a reduction in force (RIF) policy a requirement of dismissal based on “effectiveness” does not instantly validate the “effectiveness” measures. And even the best “effectiveness” measures, as discussed above, remain really problematic, providing tenured teachers reduced on grounds of ineffectiveness multiple options for legal action.

Additional Concerns

These two legal arguments ignore the fact that school districts and states will have to establish two separate types of contracts for teachers to begin with, since even in the best of statistical cases, only about 1/5 of teachers (those directly responsible for teaching math or reading in grades three through eight) might possibly be evaluated via student test scores (see: http://schoolfinance101.wordpress.com/2009/12/04/pondering-the-usefulness-of-value-added-assessment-of-teachers/)

I’ve written previously about the technical concerns over value-added assessment of teachers and my concern that pundits are seemingly completely ignorant of the statistical issues. I’m also baffled that few others in the current policy discussion seem even remotely aware of just how few teachers might – in the best possible case – be evaluated via student test scores, and the need for separate contracts. But, I am perhaps most perplexed that no-one seems to be acknowledging the massive legal mess likely to ensue when (or if) these poorly conceived policies are put into action.

I’ll save for another day the discussion of just who will be waiting in line to fill those teaching vacancies created by rigid use of test scores for disproportionately dismissing teachers in poor urban schools. Will they, on average, be better or perhaps worse than those displaced before them? Just who will wait in this line to be unfairly judged?

For a related article on the use of certification exams for credentialing teachers, see:

Green, P.C., Sireci, S.G. (2005) Legal and Psychometric Criteria for Evaluating Teacher Certification Tests.  Educational Measurement: Issues and Practice. Volume 19 Issue 1, Pages 22 – 31

About these ads