Friday Thoughts on Data, Assessment & Informed Decision Making in Schools

Some who read this blog might assume that I am totally opposed, in any/all circumstances to using data in schools to guide decision-making. Despite my frequent public cynicism I assure you that I believe that much of the statistical information we collect on and in schools and school systems can provide useful signals regarding what’s working and what’s not, and may provide more ambiguous signals warranting further exploration – through both qualitative information gathering (observation, etc.) and additional quantitative information gathering.

My personal gripe is that thus far – especially in public policy – we’ve gone about it all wrong.  Pundits and politicians seem to have this intense desire to impose certainty where there is little or none and impose rigid frameworks with precise goals which are destined to fail (or make someone other than the politician look as if they’ve failed).

Pundits and politicians also feel the intense desire to over-sample the crap out of our schooling system – taking annual measurements on every child over multiple weeks of the school year when strategic sampling of selected testing items across samples of students and settings might provide more useful information at lower cost and be substantially less invasive (NAEP provides one useful example). To protect the health of our schoolchildren, we don’t make them all walk around all day with rectal thermometers hanging out of…well… you know?  Nor do political pollsters attempt to poll 100% of likely voters.  Nor should we feel the necessity to have all students take all of the assessments, all of the time, if our goal is to ensure that the system is getting the job done/making progress.

In my view, a central reason for testing and measurement in schools is what I would refer to as system monitoring,  where system monitoring is best conducted in the least intrusive and most cost-effective way – such that the monitoring itself does not become a major activity of the system!  We just need enough sampling density in our assessments to generate sufficient estimates at each relevant level of the system.

I know there are those who would respond that testing everyone every year ensures that no kids fall through the cracks. If we did it my less intrusive way… kids who weren’t given all test questions in math in a given year might fall through some hypothetical math crack somewhere. But it is foolish to assume that NCLB-every-student-every-year testing regimes actually solve that problem. Further, high stakes testing with specific cut scores either for graduation or grade promotion violates one of the most basic tenets of statistical measurement of student achievement – that these measures are not perfectly precise. They can’t identify exactly  where that crack is, or which kid actually fell through it! One can’t select a cut score and declare that the child one point above that score (who got one more question correct on that given day) is ready (with certainty) for the next grade (or to graduate) and the child 1 point below is not. In all likelihood these two children are not different at all in their actual “proficiency” in the subject in question. We might be able to say – by thoughtful and rigorous analysis – that on average, students who got around this score in one year, were likely to get a certain score in a later year, and perhaps even more likely to make it beyond remedial course work in college. And we might be able to determine if students attending a particular school or participating in a particular program are more or less likely (yeah… probability again) to succeed in college.

Thoughtful analysis and more importantly thoughtful USE of testing data in schools requires a healthy respect for what those numbers can and cannot tell us… and nuanced understanding that the numbers typically include a mix of non-information (noise/unexplainable, non-patterned information), good information (true signal) and perhaps misinformation (false signal, or bias, variation caused by something other than what we think it’s caused by).

These issues apply generally to our use of student assessment data in schools and also apply specifically to an area I discuss often on this blog – statistical evaluation of teacher influence on tested student outcomes.

I was pleased to see the Shankerblog column by Doug Harris a short while back in which Doug presented a more thoughtful approach to integrating value-added estimates into human resource management in the schooling context. Note that Doug’s argument is not new at all, nor is it really his own unique view. I first heard this argument in a presentation by Steve Glazerman (of Mathematica) at Princeton a few years ago. Steve also used the noisy medical screening comparison to explain the use of known-to-be-noisy information to assist in making more efficient decisions/taking more efficient steps in diagnosis. That is, with appropriate respect for the non-information in the data, we might actually find ways to use that information productively.

Last spring, I submitted an article (still under review) in which I, along with my coauthors Preston Green and Joseph Oluwole explained:

As we have explained herein, value-added measures have severe limitations when attempting even to answer the narrow question of the extent to which a given teacher influences tested student outcomes. Those limitations are sufficiently severe such that it would be foolish to impose on these measures, rigid, overly precise high stakes decision frameworks.  One simply cannot parse point estimates to place teachers into one category versus another and one cannot necessarily assume that any one individual teacher’s estimate is necessarily valid (non-biased).  Further, we have explained how student growth percentile measures being adopted by states for use in teacher evaluation are, on their face, invalid for this particular purpose.  Overly prescriptive, overly rigid teacher evaluation mandates, in our view, are likely to open the floodgates to new litigation over teacher due process rights, despite much of the policy impetus behind these new systems supposedly being reduction of legal hassles involved in terminating ineffective teachers.

This is not to suggest that any and all forms of student assessment data should be considered moot in thoughtful management decision making by school leaders and leadership teams. Rather, that incorrect, inappropriate use of this information is simply wrong – ethically and legally (a lower standard) wrong. We accept the proposition that assessments of student knowledge and skills can provide useful insights both regarding what students know and potentially regarding what they have learned while attending a particular school or class. We are increasingly skeptical regarding the ability of value-added statistical models to parse any specific teacher’s effect on those outcomes. Further, the relative weight in management decision-making placed on any one measure depends on the quality of that measure and likely fluctuates over time and across settings. That is, in some cases, with some teachers and in some years, assessment data may provide leaders and/or peers with more useful insights.  In other cases, it may be quite obvious to informed professionals that the signal provided by the data is simply wrong – not a valid representation of the teacher’s effectiveness.

Arguably, a more reasonable and efficient use of these quantifiable metrics in human resource management might be to use them as a knowingly noisy pre-screening tool to identify where problems might exist across hundreds of classrooms in a large district. Value-added estimates might serve as a first step toward planning which classrooms to observe more frequently. Under such a model, when observations are completed, one might decide that the initial signal provided by the value-added estimate was simply wrong. One might also find that it produced useful insights regarding a teacher’s (or group of teachers’) effectiveness at helping students develop certain tested algebra skills.

School leaders or leadership teams should clearly have the authority to make the case that a teacher is ineffective and that the teacher even if tenured should be dismissed on that basis. It may also be the case that the evidence would actually include data on student outcomes – growth, etc. The key, in our view, is that the leaders making the decision – indicated by their presentation of the evidence – would show that they have used information reasonably to make an informed management decision. Their reasonable interpretation of relevant information would constitute due process, as would their attempts to guide the teacher’s improvement on measures over which the teacher actually had control.

By contrast, due process is violated where administrators/decision makers place blind faith in the quantitative measures, assuming them to be causal and valid (attributable to the teacher) and applying arbitrary and capricious cutoff-points to those measures (performance categories leading to dismissal).   The problem, as we see it, is that some of these new state statutes require these due process violations, even where the informed, thoughtful professional understands full well that she is being forced to make a wrong decision. They require the use of arbitrary and capricious cutoff-scores. They require that decision makers take action based on these measures even against their own informed professional judgment.

My point is that we can have thoughtful, data informed (NOT DATA DRIVEN) management in schools. We can and should! Further, we can likely have thoughtful data informed management (system monitoring) through far less intrusive methods than currently employed – taking advantage of advancements in testing and measurement, sampling design etc. But we can only take these steps if we recognize the limits of data and measurement in our education systems.

Unfortunately, as I see it, current policy efforts enforcing the misuse of assessment data (as illustrated here, here and here) and misuse of estimates of teacher effectiveness based on those data (as illustrated here) will likely do far more harm than good.  Unfortunately, I don’t see things turning corner any time soon.

Until then, I may just have to stick to my current message of Just say NO!



  1. You can now add Utah to the list of states adopting a SGP-based system, as part of our ESEA waiver. We just rolled out the UCAS system that gives a score (out of 600) to every school in the state, half based on growth (SGP/Colorado model), half based on proficiency. See here (

    The more deeply I study this, the more I wish we were able to use the measures as the indicator/flags for further study they seem to have been designed as. Betebenner’s writings on this seem to want to have it both ways, though, which confuses me. If an SGP is “descriptive” and not clearly attributable to a teacher, how can that have *any* part in a score for a teacher — might as well include the outside temperature on the day of the test.

    We’re moving towards more testing as well (expanding annual tests up through 12th grade, doing at least one, maybe two, interim tests earlier in the year). Forward into the fog.

    [Disclaimer – I do work for a Utah school district, as head of a tiny little data group]

  2. Bruce, I wonder about your comments on having a healthy respect for what numbers can and cannot tell us about teachers who are being required to organize their teaching around numerically based SMART goals. (For example, X percent of students will achieve X percent on the next test.) I’m uncomfortable with this because once you use a number it bestows a semblance of objectivity that isn’t there and I am forced to look at certain practices and behaviors to the exclusion of others that my own professional, subjective judgment tells me need attention..I’ll give you an example: i teach AP Psychology. I value students’ participation in class as well as their abilities to pass the AP exam. I don’t only require my students to pass paper and pencil tests, I also require them to participate in debates, develop and present their own original research, as well as participate in daily class discussions. As I’m required now by my school district to write Smart Goals that focus on gains students can show from pre-tests (which I see as a waste of time) and post-tests, the real learning that students do on a daily basis is getting ignored by my district. I know students need this to learn and I know that I can daily assess them informally to know where they are in their learning. However, the district only sees a limited portion of what they value: pencil and paper tests. I’m becoming more and more frustrated with the limitations that this way of doing business are having on my professionalism.

    1. Part of what I mean regarding healthy respect for what the numbers mean and do not is that the appearance of “objectivity” is only that – an appearance… as accepted by those who really just don’t get it. So yes, this can be dangerous. The “numbers” in our education system are never “objective.” There are many subjective judgments that go into determining a) what to measure and b) how to measure it, and these judgments lead to subjective collections of data points – realizations on an underlying system. Our subjective choices in measurement and measurement design then may influence how we subjectively use that information to inform decisions, or not. It’s all subjective to an extent, and it’s all relative. But that doesn’t mean it’s useless. The foolishness in education policy largely centers on this issue of “objective=good” and “subjective=bad” and data – no matter how we collect it or what we collect is necessarily objective, therefore good. I agree that those wanting to use data knowing it’s limitations often provide fodder for this intent on misusing data, the minute we actually start collecting data. And that’s why I’m currently inclined to argue for just shutting the whole “big data” thing down – until policymakers/pundits are willing to step out of the way (which isn’t likely to happen)!

  3. Sorry – rushing so didn’t read the whole piece, but wanted to mention that I just participated in a workshop with Pasi Sahlberg and he admitted that data would be useful, but that Finland has – so far – decided that the dangers outweigh the benefits. They seem to have managed without it.

Comments are closed.