Some who read this blog might assume that I am totally opposed, in any/all circumstances to using data in schools to guide decision-making. Despite my frequent public cynicism I assure you that I believe that much of the statistical information we collect on and in schools and school systems can provide useful signals regarding what’s working and what’s not, and may provide more ambiguous signals warranting further exploration – through both qualitative information gathering (observation, etc.) and additional quantitative information gathering.
My personal gripe is that thus far – especially in public policy – we’ve gone about it all wrong. Pundits and politicians seem to have this intense desire to impose certainty where there is little or none and impose rigid frameworks with precise goals which are destined to fail (or make someone other than the politician look as if they’ve failed).
Pundits and politicians also feel the intense desire to over-sample the crap out of our schooling system – taking annual measurements on every child over multiple weeks of the school year when strategic sampling of selected testing items across samples of students and settings might provide more useful information at lower cost and be substantially less invasive (NAEP provides one useful example). To protect the health of our schoolchildren, we don’t make them all walk around all day with rectal thermometers hanging out of…well… you know? Nor do political pollsters attempt to poll 100% of likely voters. Nor should we feel the necessity to have all students take all of the assessments, all of the time, if our goal is to ensure that the system is getting the job done/making progress.
In my view, a central reason for testing and measurement in schools is what I would refer to as system monitoring, where system monitoring is best conducted in the least intrusive and most cost-effective way – such that the monitoring itself does not become a major activity of the system! We just need enough sampling density in our assessments to generate sufficient estimates at each relevant level of the system.
I know there are those who would respond that testing everyone every year ensures that no kids fall through the cracks. If we did it my less intrusive way… kids who weren’t given all test questions in math in a given year might fall through some hypothetical math crack somewhere. But it is foolish to assume that NCLB-every-student-every-year testing regimes actually solve that problem. Further, high stakes testing with specific cut scores either for graduation or grade promotion violates one of the most basic tenets of statistical measurement of student achievement – that these measures are not perfectly precise. They can’t identify exactly where that crack is, or which kid actually fell through it! One can’t select a cut score and declare that the child one point above that score (who got one more question correct on that given day) is ready (with certainty) for the next grade (or to graduate) and the child 1 point below is not. In all likelihood these two children are not different at all in their actual “proficiency” in the subject in question. We might be able to say – by thoughtful and rigorous analysis – that on average, students who got around this score in one year, were likely to get a certain score in a later year, and perhaps even more likely to make it beyond remedial course work in college. And we might be able to determine if students attending a particular school or participating in a particular program are more or less likely (yeah… probability again) to succeed in college.
Thoughtful analysis and more importantly thoughtful USE of testing data in schools requires a healthy respect for what those numbers can and cannot tell us… and nuanced understanding that the numbers typically include a mix of non-information (noise/unexplainable, non-patterned information), good information (true signal) and perhaps misinformation (false signal, or bias, variation caused by something other than what we think it’s caused by).
These issues apply generally to our use of student assessment data in schools and also apply specifically to an area I discuss often on this blog – statistical evaluation of teacher influence on tested student outcomes.
I was pleased to see the Shankerblog column by Doug Harris a short while back in which Doug presented a more thoughtful approach to integrating value-added estimates into human resource management in the schooling context. Note that Doug’s argument is not new at all, nor is it really his own unique view. I first heard this argument in a presentation by Steve Glazerman (of Mathematica) at Princeton a few years ago. Steve also used the noisy medical screening comparison to explain the use of known-to-be-noisy information to assist in making more efficient decisions/taking more efficient steps in diagnosis. That is, with appropriate respect for the non-information in the data, we might actually find ways to use that information productively.
Last spring, I submitted an article (still under review) in which I, along with my coauthors Preston Green and Joseph Oluwole explained:
As we have explained herein, value-added measures have severe limitations when attempting even to answer the narrow question of the extent to which a given teacher influences tested student outcomes. Those limitations are sufficiently severe such that it would be foolish to impose on these measures, rigid, overly precise high stakes decision frameworks. One simply cannot parse point estimates to place teachers into one category versus another and one cannot necessarily assume that any one individual teacher’s estimate is necessarily valid (non-biased). Further, we have explained how student growth percentile measures being adopted by states for use in teacher evaluation are, on their face, invalid for this particular purpose. Overly prescriptive, overly rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights, despite much of the policy impetus behind these new systems supposedly being reduction of legal hassles involved in terminating ineffective teachers.
This is not to suggest that any and all forms of student assessment data should be considered moot in thoughtful management decision making by school leaders and leadership teams. Rather, that incorrect, inappropriate use of this information is simply wrong – ethically and legally (a lower standard) wrong. We accept the proposition that assessments of student knowledge and skills can provide useful insights both regarding what students know and potentially regarding what they have learned while attending a particular school or class. We are increasingly skeptical regarding the ability of value-added statistical models to parse any specific teacher’s effect on those outcomes. Further, the relative weight in management decision-making placed on any one measure depends on the quality of that measure and likely fluctuates over time and across settings. That is, in some cases, with some teachers and in some years, assessment data may provide leaders and/or peers with more useful insights. In other cases, it may be quite obvious to informed professionals that the signal provided by the data is simply wrong – not a valid representation of the teacher’s effectiveness.
Arguably, a more reasonable and efficient use of these quantifiable metrics in human resource management might be to use them as a knowingly noisy pre-screening tool to identify where problems might exist across hundreds of classrooms in a large district. Value-added estimates might serve as a first step toward planning which classrooms to observe more frequently. Under such a model, when observations are completed, one might decide that the initial signal provided by the value-added estimate was simply wrong. One might also find that it produced useful insights regarding a teacher’s (or group of teachers’) effectiveness at helping students develop certain tested algebra skills.
School leaders or leadership teams should clearly have the authority to make the case that a teacher is ineffective and that the teacher even if tenured should be dismissed on that basis. It may also be the case that the evidence would actually include data on student outcomes – growth, etc. The key, in our view, is that the leaders making the decision – indicated by their presentation of the evidence – would show that they have used information reasonably to make an informed management decision. Their reasonable interpretation of relevant information would constitute due process, as would their attempts to guide the teacher’s improvement on measures over which the teacher actually had control.
By contrast, due process is violated where administrators/decision makers place blind faith in the quantitative measures, assuming them to be causal and valid (attributable to the teacher) and applying arbitrary and capricious cutoff-points to those measures (performance categories leading to dismissal). The problem, as we see it, is that some of these new state statutes require these due process violations, even where the informed, thoughtful professional understands full well that she is being forced to make a wrong decision. They require the use of arbitrary and capricious cutoff-scores. They require that decision makers take action based on these measures even against their own informed professional judgment.
My point is that we can have thoughtful, data informed (NOT DATA DRIVEN) management in schools. We can and should! Further, we can likely have thoughtful data informed management (system monitoring) through far less intrusive methods than currently employed – taking advantage of advancements in testing and measurement, sampling design etc. But we can only take these steps if we recognize the limits of data and measurement in our education systems.
Unfortunately, as I see it, current policy efforts enforcing the misuse of assessment data (as illustrated here, here and here) and misuse of estimates of teacher effectiveness based on those data (as illustrated here) will likely do far more harm than good. Unfortunately, I don’t see things turning corner any time soon.
Until then, I may just have to stick to my current message of Just say NO!