Introducing the Reform-Inator!

Introducing the Coolest New Gadget of the Year – just in time for last-day shopping! The Reform-inator!

  1. Can be used to instantly fire and/or de-tenurize teachers. However, in order to use the reform-inator for these purposes you must line up 100 teachers including all of the good, bad and average ones. The reforminator is a bit touchy… and misfires quite frequently … hitting an average teacher instead of a truly bad one about 35% of the time, and hitting a good teacher instead of a truly bad one about 20% of the time. But what the heck… go for it. Thin the herd. Probabilities are in your favor, if only marginally. And besides, there will be plenty more teachers willing to step up and face the firing line next year.
  2. Can be used to instantly replicate (or new reformy term: scalify, or scalification) only the upper half of charter schools, because we all know that the upper half of charter schools are … well… better than average ones, and well… good charters are good… and bad ones bad (but no need to talk about those, just as there’s no need to talk about the good traditional public schools)… so we really want to replicate and expand only those good charters (primarily by reduced regulation, increased numbers of authorizers and reduced oversight requirements, even though the track record to date hasn’t really shown that to be easily accomplished).
  3. Can be used to take anything that is presently about 7% smaller than it was in the past, and make it disappear entirely – GONE… ALL GONE… just like all of the money for public schools. It’s not just recessed – temporarily diminished – It’s just gone. Vanished. Time to shut it all down! No more sweetheart deals (especially in those really crazy overspending states like Arizona and Utah)!
  4. Can instantly make value-added estimates of teacher effectiveness the “true” measure of teacher effectiveness, and further, can make value-added estimates of teacher effectiveness a stronger predictor of themselves… which of course, are the true measure of effectiveness (stronger than a weak to moderate correlation, that is). Use the special self-validation trigger for this particular effect. Also works for low self-esteem.
  5. Can be used to locate Superman (‘cuz I sure can’t find him in these scatterplots of NYC charter school performance compared to traditional public schools, or these from Jersey either).
  6. Will eliminate entirely anything that might be labeled as Status Quo! Because we all know that if it’s status quo – it’s got to go (or at the very least, the first reformy role of logic: “anything is better than the status quo”)
  7. Most importantly, like any good REFORMY tool, it’s got a Trigger!

Other ideas?

Is it the “New Normal” or the “New Stupid?”

I’ll admit from the start that I’m recycling some arguments here (okay… all of the arguments) … but this stuff needs to be reinforced, over and over again. Quite honestly, to me, from a school finance perspective, this is the most important issue that has surfaced in the past year, and potentially the most dangerous and damaging for the future of American public education.

Robert Reich of Berkeley recently wrote of the Attack on American Education:

Specifically, Reich pointed to substantial budget cuts across states as evidence of our de-investment in public schooling. Here are the first three states (by alphabetical order), and the education spending cuts mentioned by Reich in his blog post:

  • Arizona has eliminated preschool for 4,328 children, funding for schools to provide additional support to disadvantaged children from preschool to third grade, aid to charter schools, and funding for books, computers, and other classroom supplies. The state also halved funding for kindergarten, leaving school districts and parents to shoulder the cost of keeping their children in school beyond a half-day schedule.
  • California has reduced K-12 aid to local school districts by billions of dollars and is cutting a variety of programs, including adult literacy instruction and help for high-needs students.
  • Colorado has reduced public school spending in FY 2011 by $260 million, nearly a 5 percent decline from the previous year. The cut amounts to more than $400 per student.

As I have mentioned on numerous previous occasions, even the assumption that these cuts represent “de-investment” (suggesting cutting back on something that has been scaled up over time) is flawed, because it accepts that these states actually invested to begin with. Reich points out that current attack is a seemingly unprecedented attack on public education budgets across states, in both K-12 and higher education and arguably an attack on promoting an educated society more generally:

Have we gone collectively out of our minds? Our young people — their capacities to think, understand, investigate, and innovate — are America’s future. In the name of fiscal prudence we’re endangering that future.

But even Reich’s arguments fail to point out that in many of these states, the attack on education and de-investment (if there ever was significant investment, or scale up) has been occurring for decades. In good times, and in bad… Bad economic times just provide a more convenient excuse. Couple that with all of the new rhetoric about the “New Normal” and the excuses to slash-and-burn public school funding are at an all time high.

Let’s review:

First, here’s where the above three states fit into comparisons of state and local education revenue per pupil. Yes, some of the higher spending states are cutting back as well, if you read down Reich’s list of education spending cuts, but these three states have a particularly rich history of low spending and education cutbacks (including year after year mid-year funding recisions, even in good economic times in Colorado) .

Figure 1

Okay,so who cares if they aren’t spending that much. Maybe it’s because they’ve been taxing themselves to death… like we all have, obviously… we all know that… and that education spending is simply eating away at their economies. It’s just not sustainable!

So, here are direct expenditures on education (k-12 and higher ed) as a percent of aggregate personal income for each state. California has been flat, and low for over 30 years and Colorado and Arizona which were once relatively high, have decreased their effort consistently for about 30 years, in a race to the bottom.

Figure 2

Total Direct Education Spending as a Percent of Personal Income

Yeah but… yeah but….yeah but… it’s because their total taxes are so darn high. This is just education. Well then:

Figure 3

Yes, even on these, California is perhaps somewhat above average, whereas Colorado in recent years has been sitting near the bottom. Arizona jumped up in recent years, but is by no means high, compared with other states or trended, over time, out of control.

But even then, we know they’ve all gone wild on teacher hiring… bloating that teacher workforce, reducing class sizes and pupil teacher ratios to inefficiently low levels:

Figure 4

Pupil to Teacher Ratios over Time

Okay, well maybe not California, Arizona or Colorado (or Utah… in Gray at the top of the figure). California did increase teacher numbers in the late 1990s with class size reduction, but that flattened out and increased since, with lack of financial support.

But we all know that none of this matters anyway, right?

In fact, REFORMY logic dictates that it’s those states which have been spending like crazy, wasting their effort and paying for way too many teachers that are a real drag on our national test scores AND our economy.

The problem is not states like California, Arizona or reformy standouts like Colorado (or Tennessee or Lousiana), but rather, those over-educated curmudgeonly high spending non-reformy, low pupil teacher ratio states like Vermont, Massachusetts and New Jersey.

They – yes they – with their gold-plated schools are the shame of our nation (and why we can’t be Finland, right?)!  Our national education emergency (if there is one) is certainly not the fault of those states exercising consistent and appropriate fiscal austerity in good times or in bad.


Figure 5

Relationship Between State & Local Revenue per Pupil (for high poverty districts) & NAEP Mean Scale Scores

On average, states like Arizona and California which have high need student populations, but have thrown their public schools under the bus, are a significant drag on our national performance.

And this is due to lack of effort as much as it is lack of capacity.  Higher effort states also tend to be the higher spending states which also tend to have the higher outcomes. And, when taken as a separate group, compare quite favorably on international performance comparisons.

Figure 6

Relationship between Fiscal Effort and Level of Financial Resources

Finally, these differences in outcomes, effort and pupil to teacher ratios are not all about differences in poverty. Again, I’ve already pointed out that these states have high pupil-to-teacher ratios and low spending not because they are poor but rather because they don’t put up the effort.

And now we are boldly (and belligerently) encouraging them to “do more with less” by which we actually mean “do even less with less?”

To clarify how poverty rates fit within this picture, Figure 7 provides adjusted state poverty estimates (see citation below figure) and pupil to teacher ratios. At their respective poverty levels, each of these states has higher – if not much higher than average pupil to teacher ratios. They also have much lower than average per pupil spending.

Figure 7

State Cost Adjusted Poverty Estimates and Pupil to Teacher Ratios

Renwick, Trudi. Alternative Geographic Adjustments of U.S. Poverty Thresholds: Impact on State Poverty Rates. U.S. Census Bureau, August 2009

Further, while these states have higher pupil to teacher ratios than other states with similar poverty rates, they also have very low outcomes even compared to other states with similar corrected poverty rates. Colorado remains somewhat in the middle of the pack on outcomes, having a lower poverty population than either Arizona or California and also having more recently slashed and burned its public education system. Colorado pupil to teacher ratios have also remained closer to those of other states, and much lower than California or Arizona.

Figure 8

State Cost Adjusted Poverty Estimates and NAEP Mean Outcomes


How does this all fit into the long-run picture of investment in public schooling? Yes, we’ve had the most significant economic downturn in several decades. State budgets took a hit, and good information on that budget hit can be found at, where, among other things, data show that the most recent quarterly estimates of state revenue are still about 7% off their peak in 2008. That’s 7% – not 100%, not 20% (even more important is the variation across states). It’s a hole. But it’s not ALL GONE (and only a complete fool would argue as much)! Note that there have been in the past few decades at least two other significant economic slowdowns/downturns that affected state revenues and education spending – from about 1989 to 1992 – with lagged effects in some regions, and from 2001 to 2002 (post 9/11 shock).  In some states, education spending rebounded in the wake of these downturns, but in others, state legislatures continued to constrain if not outright slash-and burn state education budgets (while expanding tax cuts) throughout the economic good times that followed each downturn (1996ish to 2001 and 2002 t 2008).

What’s different now? Why are we sitting at the edge of a much more dangerous policy agenda? Well, the recent economic downturn was greater. But again, recent data shows the beginnings of a rebound. What is most different is that we are now faced with this completely absurd argument of The New Normal – as a national agenda to scale back education spendingEVEN IN STATES WHERE IT HAD ALREADY BEEN SCALED BACK FOR DECADES. But who knew? Didn’t every state just spend out of its freakin’ mind for …oh… the past hundred years or so?

The New Normal argument that we must cut back our bloated education budgets and increase class sizes and pupil to teacher ratios back to reasonable levels is, at best, based on the shallowest understanding of (hyper-aggregated & overstated) national “trends” in education spending and pupil to teacher ratios, coupled with complete obliviousness to the variations in effort and spending and pupil to teacher ratios that exist across states, and for that matter, the demographic trends in some states which make it appear as if education spending has spiraled out of control (Vermont). That is, if we assume that those pitching-tweeting-blogging The New Normal have even the first clue about trends in education spending, state school finance systems, and the quality of public schooling across states to begin with. Personally, I’m not sure they do. In fact, I’m increasingly convinced they don’t.

A few comments on the Gates/Kane value-added study

A few comments on the Gates/Kane Value-added study

(My apologies in advance for an excessively technical, research geeky post, but I felt it necessary in this case)

Take home points

1) As I read it, the new Gates/Kane value-added findings are NOT by any stretch of the imagination an endorsement of using value-added measures of teacher effectiveness for rating individual teachers as effective or not or for making high-stakes employment decisions. In this regard, the Gates/Kane findings are consistent with previous findings regarding stability, precision and accuracy of rating individual teachers.

2) Even in the best of cases, measures used in value-added added models remain insufficiently precise or accurate to account for the differences in children served by different teachers in different classrooms (see discussion of poverty measure in first section, point #2 below)

3) Too many of these studies, including this one, adopt the logic that value-added outcomes can be treated both as a measure of effectiveness to be investigated (independent variable) and as the true measure of effectiveness (the dependent measure). That is, this study like others evaluates the usefulness of both value added measures and other measures of teacher quality by their ability to predict future (or different group) value-added measures. Certainly, the deck is stacked in favor of value added measures under such a model. See value-added as a predictor of itself below.

4) Value-added measures can be useful for exploring variations in student achievement gains across classroom settings and teachers, but I would argue that they remain of very limited use for identifying more precisely or accurately, the quality of individual teachers.  Among other things, the most useful findings in the new Gates/Kane study apply to very few teachers in the system (see final point below).

Detailed discussion

Much has been made of the preliminary findings of the Gates Foundation study on teacher effectiveness. Jason Felch of the LA Times has characterized the study as an outright endorsement of the use of Value-added measures as the primary basis for determining teacher effectiveness. Mike Johnston, the Colorado State Senator behind that state’s new teacher tenure law, which requires that 50% of teacher evaluation be based on student growth (and tenure and removal of tenure based on the evaluation scheme), also seemed thrilled – via twitter – that the Gates study found that value-added scores in one year predict value-added scores in another – seemingly assuming this finding unproblematically endorses his policies (?) (via Twitter: SenJohnston Mike Johnston New Gates foundation report on effective teaching: value added on state test strongest predictor of future performance).


Rather, the new Gates study tells us that we can use value-added analysis to learn about variations in student learning (or at least in test score growth) across classrooms and schools and that we can assume that some of this variation is related to variations in teacher quality. But, there remains substantial uncertainty in the capacity to estimate whether any one teacher is a good teacher or a bad one.

Perhaps the most important and interesting aspects of the study are its current and proposed explorations of the relationship between value-added measures and other measures, including student perceptions, principal perceptions and external evaluator ratings.

Gates Report vs. LA Times Analysis

In short, data quality and modeling matter, but you can only do so much.

For starters, let’s compare some of the features of the Gates study value added models to the LAT models. These are some important differences to look for when you see value- added models being applied to study student performance differences across classrooms – especially where the goal is to assign outcome effects to teachers.

  1. The LAT Times model, like many others, uses annual achievement data (as far as I can tell) to determine teacher effectiveness, whereas the Gates study at least explores the seasonality of learning – or more specifically, how much achievement change occurs over the summer (which is certainly outside of teacher’s control AND differs across students by their socioeconomic status). One of the more interesting findings of the Gates study is that from 4th grade on: “The norm sample results imply that students improve their reading comprehension scores just as much (or more) between April and October as between October and April in the following grade. Scores may be rising as kids mature and get more practice outside of school.” This means that if there exist substantial differences in summer learning by students’ family income level and/or other factors, as has been found in other studies, then using annual data could significantly and inappropriately disadvantage teachers who are assigned students whose reading skills lagged over the summer. The existing blunt indicator of low income status is unlikely to be sufficiently precise to correct for summer learning differences.
  2. The LA Times model did include such blunt measures for poverty status and language proficiency, as well as disability status (single indicator), but later found shares of gifted children to be associated with differences in teacher ratings, along with student race. The Gates study includes similarly crude indicators of socioeconomic status, but does include in their value-added model whether individual children are classified as gifted. It also includes student race and the average characteristics of students in each classroom (peer group effect). This is much richer and more appropriate model, but still likely insufficient to fully account for the non-random distribution of students.  That is, the Gates study models at least attempt to correct for the influence of peers in the classroom in addition to individual characteristics of students, but even this may be insufficient. One particular concern of mine is the use of a single dichotomous measure of child poverty – whether the child qualifies for free or reduced price lunch – and the share of children in each class who do. The reality is that in many urban public schooling settings like those involved in the Gates study, several elementary/middle schools have over 80% children qualifying for free or reduced lunch, but this apparent similarity is no guarantee of similar poverty conditions among the children in one school or classroom compared to another. One classroom might be filled 80% with children whose family income is at or below 100% income threshold for poverty, whereas another classroom might be filled with 80% children whose income is 85% higher (at the threshold for “reduced” price lunch). This is a big difference that is not captured with this crude measure.
  3. The LAT analysis uses a single set of achievement measures. Other studies like the work of Sean Corcoran (see below) using data from Houston, TX have shown us the relatively weak relationship between value-added ratings of teachers produced by one test and value added ratings of teachers produced by another test. Thankfully, the Gates foundation analysis takes steps to explore this question further, but I would argue, overstates the relationship they found between tests or states that relationship in a way that might be misinterpreted by pundits seeking to advance the use of value-added for high stakes decisions (more later).

Learning about Variance vs. Rating Individual Teachers with Precision and Accuracy

If we are talking about using the value-added method to classify individual teachers as effective or ineffective and to use this information as the basis for dismissing teachers or for compensation, then we should be very concerned with the precision and accuracy of the measures as they apply to each individual teacher. In this context, one can characterize precision and accuracy as follows.

  • Precision – That there exists little error in our estimate that a teacher is responsible for producing good or bad student value-added on the test instrument used.  That is, we have little chance of classifying a good teacher as bad, an average teacher as bad, or vice versa.
  • Accuracy – That the test instrument and our use of it to measure teacher effectiveness is really measuring “true” effectiveness of the teacher – or truly how good that teacher is at doing all of the things we expect that teacher to do.

If, instead of classifying individual teachers as good or bad (and firing them, or shaming them in the newspaper or on milk cartons), we are actually interested in learning about variations in “effectiveness” across many teachers and many sections of students over many years, and whether student perceptions, supervisor evaluations, classroom conditions and teaching practices are associated with differences in effectiveness, we are less concerned about precise and accurate classification of individuals and more concerned about the relationships between measures, across many individuals (measured with error).  That is, do groups of teachers who do more of “X” seem to produce better value-added gains? Do groups of teachers prepared in this way seem to produce better outcomes? We are not concerned about whether a given teacher is accurately “scored.” Instead, we are concerned about general trends and averages.

The Gates study, like most previous studies, finds what I would call relatively weak correlations between the value-added score an individual teacher receives for one section of students in math or reading compared to another, and from one year to the next. The Gates research report noted:

“When the between-section or between-year correlation in teacher value-added is below .5, the implication is that more than half of the observed variation is due to transitory effects rather than stable differences between teachers. That is the case for all of the measures of value-added we calculated.”

Below is a table of those correlations – taken from their Table #5.

Unfortunately, summaries of the Gates study seem to obsess on how relatively high the correlation is from year to year for teachers rated by student performance on the state math test (.404) and largely ignore how much lower many of the other correlations are. Why is the correlation for the ELA test under .20 and what does that say about the high-stakes usefulness of the approach? Like other studies evaluating the stability of value-added ratings, the correlations seem to run between .20 and .40, with some falling below .20. That’s not a very high correlation – which then suggests not a very high degree of precision in figuring out which individual teacher is a good teacher versus which one is bad. BUT THAT’S NOT THE POINT EITHER!

Now, the Gates study rightly points out that lower correlations do not mean that the information is entirely unimportant. The study focuses on what it calls “persistent” effects or “stable” effects, arguing that if there’s a ton of variation across classrooms and teachers, being able to explain even a portion of that variation is important – A portion of a lot is still something. A small slice of a huge pie may still provide some sustenance. The report notes:

“Assuming that the distribution of teacher effects is “bell-shaped” (that is, a normal distribution), this means that if one could accurately identify the subset of teachers with value-added in the top quartile, they would raise achievement for the average student in their class by .18 standard deviations relative to those assigned to the median teacher. Similarly, the worst quarter of teachers would lower achievement by .18 standard deviations. So the difference in average student achievement between having a top or bottom quartile teacher would be .36 standard deviations.” (p.19)

The language here is really, really, important, because it speaks to a theoretical and/or hypothetical difference between high and low performing teachers drawn from a very large analysis of teacher effects (across many teachers, classrooms, and multiple years). THIS DOES NOT SPEAK TO THE POSSIBILITY THAT WE CAN PRECISELY AND ACCURATELY IDENTIFY WHETHER ANY SINGLE TEACHER FALLS IN THE TOP OR BOTTOM GROUP! It’s a finding that makes sense when understood correctly but one that is ripe for misuse and misunderstanding.

Yes, in probabilistic terms, this does suggest that if we implement mass layoffs in a system as large as NYC and base those layoffs on value-added measures, we have a pretty good chance of increasing value-added in later years – assuming our layoff policy does not change other conditions (class size, average quality of those in the system – replacement quality). But any improvements can be expected to be far, far, far less than the .18 figure used in the passage above. Even assuming no measurement error – that the district if laying off the “right” teachers (a silly assumption), the newly hired teachers can be expected to fall, at best, across the same normal curve. But I’ve discussed my taste for this approach to collateral damage in previous posts. In short, I believe it’s unnecessary and not that likely to play out as we might assume. (see discussion of reform engineers at bottom)

A Few more Technical Notes

Persistent or Stable Effects: The Gates report focuses on what it terms “persistent” effects of teachers on student value-added – assuming that these persistent effects represent the consistent, over time or across sections influence of a specific teacher on his/her students’ achievement gains. The report focuses on such “persistent” effects for a few reasons. First, the report uses this discussion to, I would argue, overplay the persistent influence teachers have on student outcomes – as in the quote above which is later used in the report to explain the share of the black-white achievement gap that could be closed by highly effective teachers. The assertion is that even if teacher effects explain small portion of variations in student achievement gains, if variations in those gains are huge, then explaining a portion is important. Nonetheless, the persistent effects remain a relatively small portion (as high as “modest” portion in some cases) – which dramatically reduces the precision with which we can identify the effectiveness of any one teacher (taking as given that the tests are the true measure of effectiveness – the validity concern).

AND, I would argue that it is a stretch to assume that the persistent effects within teachers are entirely a function of teacher effectiveness. The persistent effect of teachers may also include the persistent characteristics of students assigned to that teacher – that the teacher, year after year, and across sections is more likely to be assigned the more difficult students (or the more expert students). Persistent pattern yes. Persistent teacher effect? Perhaps partially (How much? Who knows?).

Like other studies, the identification of persistent effects from year to year, or across sections in the new Gates study merely reinforces that with more sections and/or more years of data (more students passing through) for any given teacher, we can gain a more stable value-added estimate and more precise indication of the value-added associated with the individual teacher. Again, the persistent effect may be a measure of the persistence of something other than the teacher’s actual effectiveness (teacher X always has the most disruptive kids, larger classes, noisiest/hottest/coldest – generally worst classroom).  The Gates study does not (BECAUSE IT WASN’T MEANT TO) assess how the error rate of identifying a teacher as “good” or “bad” changes with each additional year of data, but given that other findings are so consistent with other studies, I would suspect the error rate to be similar as well.

Differences Between Tests: The Gates study provides some useful comparisons of value-added ratings of teachers on one test, compared with ratings of the same teachers on another test – a) for kids in the same section in the same year, and b) for kids in different sections of classes with the same teacher.

Note that in a similar analysis, Corcoran, Jennings and Beveridge found:

“among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

That is, analysis of teacher value-added ratings on two separate tests called into question the extent to which individual teachers might accurately be classified as effective when using a single testing instrument. That is, if we assume both tests to measure how effective a teacher is a teaching “math,” or a specific subject within “math,” then both tests should tell us the same thing about each teacher – which ones are truly effective math teachers and which ones are not. Corcoran’s findings raise serious questions about accuracy in this regard.

The Gates study argues that comparing teacher-value added across two math tests – where one is more conceptual – allows them to validate that doing well on one test, the state test – as long as the results are correlated with the other, more conceptual test, did not compromise conceptual learning. That seems reasonable enough, to the extent that the testing instruments are being appropriately described (and to the extent they are valid instruments).  In terms of value-added ratings, the Gates study, like the Corcoran study, finds only a modest relationship between ratings of teacher based on one test and ratings based on the other:

“the correlation between a teacher’s value-added on the state test and their value-added on the Balanced Assessment in Math was .377 in the same section and .161 between sections.”

But the Gates study also explores the relationships between “persistent” components across tests – which must be done across sections taking the test in the same year (until subsequent years become available). They find:

“we estimate the correlation between the persistent component of teacher impacts on the state test and on BAM is moderately large, .54.”

“The correlation in the stable teacher component of ELA value-added and the Stanford 9 OE was lower, .37.”

I’m uncomfortable with the phrasing here that says – “persistent component of teacher impacts” – in part because there exist a number of other persistent conditions or factors that may be embedded in the persistent effect, as I discuss above. Setting that aside, however, what the authors are exploring is whether the correlated component – the portions of student performance on any given test that are assumed to represent teacher effectiveness – is similar between tests.

In any case, however, these correlations like the others in the Gates analysis are telling us how highly associated – or not – the assumed persistent component is across tests across many teachers teaching many sections of the same class.  This allows the authors to assert that across all of these teachers and the various sections they teach, there is a “moderately” large relationship between student performance on the two different tests, supporting the authors’ argument that one test somewhat validates the other. But again, this analysis, like the others in the report, does not suggest by any stretch of the imagination that either one test or the other will allow us to precisely identify the good teacher versus the bad one. There is still a significant amount of reshuffling going on in teacher ratings from one test to the next, even with the same students in the same class sections in the same year. And, of course, good teaching is not synonymous with raising a student’s test scores.

This analysis does suggest that we might – by using several tests – get a more accurate picture of student performance and how it varies across teachers, and does at least suggest that across multiple tests – if the persistent component is correlated – just like across multiple years – we might get a more stable picture of which teachers are doing better/worse.  Precise enough for high stakes decisions (and besides, how much more testing can we/they handle?)? I’m still not confident that’s the case.

Value-added is the best predictor of itself

This seems to be one of the findings that gets the most media-play (and was the basis of Senator Johnston’s proud tweets). Of course value-added is a better predictor of future value-added (on the same test and with the same model) than other factors are of future value-added – even if value-added is only a weak predictor of future (or different section) value-added. Amazingly, however, many of the student survey responses on factors related to things like “Challenge” seem almost as related to value-added as value-added to itself. That is a surprising finding, and I’m not sure yet what to make of it. [note that the correlation between student ratings and VAM were for the same class & year, whereas VAM predicting VAM is a) across sections and b) across years).

Again, the main problem with this VAM predicts VAM argument is that it assumes value-added ratings in the subsequent year to be THE valid measure of the desired outcome. But that’s the part we just don’t yet know. Perhaps the student perceptions are actually a more valid representation of good teaching than the value-added measure? Perhaps we should flip the question around? It does seem reasonable enough to assume that we want to see students improve their knowledge and skills in measurable ways on high quality assessments. Whether our current batch of assessments, as we are currently using them and as they are being used in this analysis accomplishes that goal remains questionable.

What is perhaps most useful about the Gates study and future research questions is that it begins to explore with greater depth and breadth the other factors that are – and are not – associated with student achievement gains.

Findings apply to a relatively small share of teachers

I have noted in other blog posts on this topic that in the best of cases (or perhaps worst if we actually followed through with it), we might apply value added ratings to somewhat less than 20% of teachers – those directly responsible and solely responsible for teaching reading or math to insulated clusters of children in grades 3 to 8 – well… 4-8, actually … since many VA models use annual data and the testing starts with grade 3. Even for the elementary school teachers who could be rated, the content of the ratings would exclude a great deal of what they teach. Note that most of the interesting findings in the new Gates study are those which allow us to evaluate the correlations of teachers across different sections of the same course in addition to subsequent years. These comparisons can only be made at the middle school level (and/or upper elementary, if taught by section). Further, many of the language arts correlations were very low, limiting the more interesting discussions to math alone. That is, we need to keep in mind that in this particular study, most of the interesting findings apply to no more than 5% to 10% of teachers – those involved in teaching math in the upper elementary and middle grades – specifically those teaching multiple sections of the same math content each year.

Still searching for that pot of gold

The rhetoric about our decades-long drunken spending spree just won’t stop, nor will the rhetoric that the money is all gone. All of it. Nothin’ left. We spent it all. We taxed ourselves to the limit and those damn teachers unions and public schools just took it all and left us with the bill. It’s gone! all gone!

Here are some recent quotes/comments from pundits who’ve done little analytically but to offer a few absurd back of the napkin explanations for why they believe that a) we’ve been on a drunken spending spree and b) it’s all gone!

Andy Rotherham in Time:

the golden age of school spending is likely coming to an end.,8599,2035999,00.html

There’s so much more in this article, including statements about how it’s plainly obvious that for each worker added to a private firm, there is an immediate incremental return in production output (each additional worker adds $x worth of output to any private firm) whereas in education we continue to add workers and see nothing in return. Both parts of this assumption are… well… just nutty.

So, Rotherham has given us the argument that our “golden age” of school spending is coming to an end. And Mike Petrilli, in a twitter-battle with Diane Ravitch has laid down the Petrillian Truth (roll with that one Mike…it’s got a nice ring) that “The Money is Gone!”

MichaelPetrilli: That’s a great line, Diane, but it doesn’t solve the problem. The money is gone. We have to help schools cut smart.

That’s right. It’s all gone. It’s freakin’ gone. Cut, cut, cut. Cut it all. Zero out public education. It doesn’t matter what state you live in, what part of the country, your state has taxed you to the limit and has spent it all on the edu-bureaucracy. Every state… the whole nation has simply been pouring money into schools and they have to stop because the money is gone.

Okay, really, how much is gone? And has any of it come back yet? Is it really all gone forever? Is 20% gone, 50%, or perhaps even 70%? Must we reset the system to an average cost that is, say, 20% below where it was in 2008? 10?

You know, there are actually legitimate researchers and organizations out there tracking the condition of state and local revenues. And while these have been some tough times, their findings are somewhat less apocolyptic than the comments of Rotherham and Petrilli above… who don’t actually look at state budget data when making these claims. Here are the findings from the most recent quarterly report from the Rockefeller Institute:

The Rockefeller Institute’s compilation of data from 48 early reporting states shows collections from major tax sources increased by 3.9 percent in nominal terms compared to the third quarter of 2009, but was 7.0 percent below the same period two years ago. Gains were widespread, with 42 states showing an increase in revenues compared to a year earlier. After adjusting for inflation, tax revenues increased by 2.6 percent in the third quarter of 2010 compared to the same quarter of 2009. States’ personal income taxes represented a $2.5 billion gain and sales taxes a $2.0 billion gain for the period.

Yes, revenues are down. State revenues are still rolling in about 7% below where they were in 2008, but in most states have begun to rebound… in order to reach that level. We took a hit. States took a hit. Some took a bigger hit than others and some are rebounding more quickly and others more slowly.

But, I must also reiterate that not every state really put their heart into public schools or the combination of their elementary and secondary and higher education systems to begin with. Many have already been systematically reducing their spending effort for years.

A few national graphs first. Here’s total state and local government expenditure as a share of personal income over time.Yes, on average, it has climbed slightly over 30 years. And, it has oscillated in between, with government expenditure (state and local) declining as a share of personal income during those periods when personal income grew quickly.

Elementary, secondary and higher education do make up a sizable share of this spending – albeit not clearly a drunken spree. Here’s education direct expenditures as a share of state and local general expenditures over the same time period.

So, the reality is that education spending first declined as a share of general spending and has since leveled off. So actually, it may be some of that other stuff that’s creating pressure on the system, a point duly acknowledged by Rotherham. But, the current argument seems to be that public schools are discretionary – negotiable – and all of that other stuff is not. Either way, even the total growth in the previous figure is not that disconcerting.  A whole other discussion for a later point in time is the issue of how many states have kicked non-current expenditures (pension obligations and other debt) down the road for someone else to deal with.

Most importantly, however, here are the differences in direct education spending as a share of personal income across states. When it comes to public K-12 and higher education systems, states vary widely. Some have provided high levels of support for schools, allocated that support fairly and maintained appropriate levels of effort to finance their education systems. Others have thrown their education systems under the bus. They don’t need some data-proof ideologue to tell them that the money is gone and now’s the time to cut.

This figure, like the ones in my previous “bubble” post, shows the variation in “effort” across states – measured somewhat differently – but same conclusion. That’s the thing – I keep taking different angles on these data and they keep telling me similar stories – that many states have actually systematically reduced their “effort” to finance public education systems over time, and yes, some have increased effort. And, there’s an interesting story behind each trend. Again, Vermont has systematically scaled up education spending relative to personal income over time. New Jersey has increased over time as well, but New Jersey has only risen to  a relatively below average position over time. By contrast, Colorado and Arizona both provide LESS DIRECT SPENDING ON EDUCATION AS A SHARE OF PERSONAL INCOME IN 2008 THAN THEY DID IN 1977!!!!!!!!!!  And they are not the only ones.  Perhaps those states need a correction in the other direction?

It will indeed be interesting to see how these “effort” measures shift as income takes a temporary hit and a bigger one that it has in the past. Most of the differences in the level of “effort” in the above figure are a function of income. States with higher personal income are able to raise what they need in education spending with a much smaller share of income. Even New Jersey, which is a relatively high spending state has relatively low effort. Other lower effort states include Connecticut and Massachusetts.

But, back to the point – These national aggregate claims that we’re tapped out – all of us – and every state – are entirely inappropriate and irresponsible. Let’s take a hard look and a more precise look at what’s really going on. Let’s focus our attention on useful quarterly reports like those from Rockefeller Institute on the condition of state revenue and lets provide appropriately differentiated instruction to states based on the widely varied conditions they face and the widely varied levels of effort they’ve applied thus far toward improving their education systems. The current rhetoric is unhelpful, and sadly, I think that’s the point!

The problem? Cheerleading and Ceramics, of course!

David Reber with the Topeka Examiner had a great post a while back (April, 2010) addressing the deceptive logic that we should be outraged by supposed exorbitant spending on things like cheerleading and ceramics, and not worry so much about the little things, like disparities between wealthy and poor school districts. I finally saw this post today, from a tweet, and realized I had not yet blogged on this topic.

This logic/argument comes from the “research” of Marguerite Roza, who, well, has a track record of making such absurd arguments in an effort to place blame on poor urban districts and take attention away from disparities between poor urban districts and their more affluent suburban neighbors.

This new argument is really just more of the same ol’ flimsy logic from this crew. For the past several years, Roza and colleagues have attempted to argue that states have largely done their part to fix inequities in funding between school districts, and that now, the burden falls on local public school districts to clean up their act. Here’s an excerpt from one of my recent articles on this topic:

On other occasions, Roza and Hill have argued that persistent between-district disparities may exist but are relatively unimportant. Following a state high court decision in New York mandating increased funding to New York City schools, Roza and Hill (2005) opined: “So, the real problem is not that New York City spends some $4,000 less per pupil than Westchester County, but that some schools in New York [City] spend $10,000 more per pupil than others in the same city.” That is, the state has fixed its end of the system enough.

This statement by Roza and Hill is even more problematic when one dissects it more carefully. What they are saying is that the average of per pupil spending in suburban districts is only $4,000 greater than spending per pupil in New York City but that the difference between maximum and minimum spending across schools in New York City is about $10,000 per pupil. Note the rather misleading apples-and-oranges issue. They are comparing the average in one case to the extremes in another.

In fact, among downstate suburban[1] New York State districts, the range of between-district differences in 2005 was an astounding $50,000 per pupil (between the small, wealthy Bridgehampton district at $69,772 and Franklin Square at $13,979). In that same year, New York City as a district spent $16,616 per pupil, while nine downstate suburban districts spent more than $26,616 (that is, more than $10,000 beyond the average for New York City). Pocantico Hills and Greenburgh, both in Westchester County (the comparison County used by Roza and Hill), spent over $30,000 per pupil in 2005.[2] These numbers dwarf even the purported $10,000 range within New York City (a range that we agree is presumptively problematic); our conclusion based on this cursory analysis is that the bigger problem likely remains the between-district disparity in funding.

My article (with Kevin Welner) goes on to show how states have far from resolved between district disparities and that New York State in particular has among the most substantial persistent disparities between wealthy and poor school districts.For more information on persistent between district disparities that really do exist, see: Is School Funding Fair?.

I have a forthcoming paper this spring where I begin to untangle the new argument about poor urban districts really having plenty of money but simply wasting it on cheerleading and ceramics. Here’s a draft of a section of the introduction to that paper:

A handful of authors, primarily in non-peer reviewed and think tank reports posit that poor urban school districts have more than enough money to achieve adequate student outcomes and simply need to reallocate what they have toward improving achievement on tested subject areas. These authors, including Marguerite Roza and colleagues of the Center for Reinventing Public Education encourage public outrage that any school district not presently meeting state outcome standards would dare to allocate resources to courses like ceramics or activities like cheerleading. To support their argument, the authors provide anecdotes of per pupil expense on cheerleading being far greater than per pupil expense on core academic subjects like math or English.

Imagine a high school that spends $328 per student for math courses and $1,348 per cheerleader for cheerleading activities. Or a school where the average per-student cost of offering ceramics was $1,608; cosmetology, $1,997; and such core subjects as science, $739.[1]

These shocking anecdotes, however, are unhelpful for truly understanding resource allocation differences and reallocation options. For example, the major reason why cheerleading or ceramics expenses per pupil are highest is the relatively small class sizes, compared to those in English or Math. In total, the funds allocated to either cheerleading of ceramics are unlikely to have much if any effect if redistributed to reading or math.

Further, the requirement that poor urban (or other) districts currently falling below state outcome standards must re-allocate any and all resources from co-curricular and extracurricular activities toward improving achievement on tested outcomes may increase inequities in the depth and breadth of curricular offerings between higher and lower poverty schools – inequities that may be already quite substantial. That is, it may already be the case that higher poverty districts and those facing greater resource constraints are reallocating resources toward core, tested areas of curriculum and away from more advanced course offerings which extend beyond the tested curriculum and enriched opportunities including both elective courses and extracurricular activities.  Some evidence on this point already exists.

The perspective that low performing districts merely need to reallocate what they already have is particularly appealing in the current fiscal context, where state budgets and aid allocations to local public school districts are being slashed. Accepting Roza’s logic, states under court mandates or in the shadows of recent rulings regarding educational adequacy, but facing tight budgets may simply argue that high poverty and/or low performing districts should shift all available resources into the teaching of core, tested subjects. Lower poverty districts with ample resources that exceed minimum outcome standards face no such reallocation obligations, leading to substantial differences in depth and breadth of curriculum. Arguably a system that is both adequate and fair would protect the availability of deep and broad curriculum while simultaneously attempting to improve narrowly measured outcomes.

More later as this research progresses.

[1] “Downstate Suburban” refers to areas such as Westchester County and Long Island and is an official regional classification in the New York State Education Department Fiscal Analysis and Research Unit Annual Financial Reports data, which can be found here: and

[2] Interestingly, however, Bridgehampton and New York City have relatively similar “costs” due to Bridgehampton’s small size and New York City’s high student needs (see Duncombe and Yinger, 2009). The figures offered in this paragraph are based on Total Expenditures per Pupil from State Fiscal Profiles 2005. Results are similar when comparing current operating expenditures per pupil.

Potential abuses of the Parent Trigger???

This article in the LA Times has been getting a lot of buzz today –,0,1116485.story

The article discusses the use of what is called a “parent trigger” policy.  Here’s the synopsis:

On Tuesday, they intend to present a petition signed by 61% of McKinley parents that would require the Compton Unified School District to bring in a charter company to run the school. Charter schools are independently operated public schools.

“I know it’s never been done before, but I want to step up because I’m a parent who cares about my children and their education,” Murphy said Monday. She and other parents were meeting with organizers from Parent Revolution, a nonprofit that lobbied successfully last year for the so-called parent-trigger law.

So, what you’ve got is 61% of parents in a community pushing for a school to be converted to a charter school and potentially pushing for that school to be a specific type of charter school. This presents all sorts of interesting – and twisted possibilities.

I wrote about a week ago on how some charter schools, like North Star Academy in Newark have established themselves as the equivalent of elite magnet schools – potentially engaging in activities such as pushing out lower performing kids over time.

So, my question for the day is whether these “parent trigger” policies might allow a simple majority of parents – or some defined majority share – to force a reorganization of their neighborhood school into a charter – that would subsequently weed out those other “less desirable kids?”

That is, does this new policy of simple majority (mob) rule allow parents in a specific community to redefine their neighborhood school so that the school no-longer serves lower performing kids or kids whose parents are less able or for that matter less interested in engaging in a level of parent involvement that might be required by a specific charter operator? In short, can the majority of parents effectively kick out a minority of parents that they don’t like – including parents of kids with disabilities or non-English speaking parents?

Sure, you say – charters can’t discriminate in this way because they must rely on lotteries for admissions and must take children with disabilities and those unable to speak English. They would have to accept those kids in the neighborhood. Yes, by law this might be true. But experience with many charters proves otherwise. Many do rely on attrition to boost scores – somehow avoid serving kids with disabilities and non-English speaking kids. But the neighborhood school couldn’t do the same.

Taking this a step further, envision a neighborhood split along language, ethnic or even religious lines. Can the parents of the majority group force their neighborhood school to be reconstituted as a cultural, language or for that matter religion (argued as culture) specific school that is effectively hostile to the minority?

Hey education law friends – help me out with the possibilities here?

The Circular Logic of Quality-Based Layoff Arguments

Many pundits are responding enthusiastically to the new LA Times article on quality-based layoffs – or how dismissing teachers based on Value-added scores rather than on seniority would have saved LAUSD many of its better teachers, rather than simply saving its older ones.

Some are pointing out that this new LA Times report is the “right” way to use value-added as compared with the “wrong” way that LA Times had used the information previously this year.

Recently, I explained the problematic circular logic being used to support these “quality-based layoff” arguments. Obviously, if we dismiss teachers based on “true” quality measures, rather than experience which is, of course, not correlated with “true” quality measures, then we save the jobs of good teachers and get rid of bad ones. Simple enough? Not so. Here’s my explanation, once again.

This argument draws on an interesting thought piece and simulation posted at  ( Teacher Layoffs: An Empirical Illustration of Seniority vs. Measures of Effectiveness), which was later summarized in a (less thoughtful) recent Brookings report (

That paper demonstrated that if one dismisses teachers based on VAM, future predicted student gains are higher than if one dismisses teachers based on experience (or seniority). The authors point out that less experienced teachers are scattered across the full range of effectiveness – based on VAM – and therefore, dismissing teachers on the basis of experience leads to dismissal of both good and bad teachers – as measured by VAM. By contrast, teachers with low value-added are invariably – low value-added – BY DEFINITION. Therefore, dismissing on the basis of low value-added leaves more high value-added teachers in the system – including more teachers who show high value-added in later years (current value added is more correlated with future value added than is experience).

It is assumed in this simulation that VAM (based on a specific set of assessments and model specification) produces the true measure of teacher quality both as basis for current teacher dismissals and as basis for evaluating the effectiveness of choosing to dismiss based on VAM versus dismissing based on experience.

The authors similarly dismiss principal evaluations of teachers as ineffective because they too are less correlated with value-added measures than value-added measures with themselves.

Might I argue the opposite? – Value-added measures are flawed because they only weakly predict which teachers we know – by observation – are good and which ones we know are bad? A specious argument – but no more specious than its inverse.

The circular logic here is, well, problematic. Of course if we measure the effectiveness of the policy decision in terms of VAM, making the policy decision based on VAM (using the same model and assessments) will produce the more highly correlated outcome – correlated with VAM, that is.

However, it is quite likely that if we simply use different assessment data or different VAM model specification to evaluate the results of the alternative dismissal policies that we might find neither VAM-based dismissal nor experienced based dismissal better or worse than the other.

For example, Corcoran and Jennings conducted an analysis of the same teachers on two different tests in Houston, Texas, finding:

…among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.

  • Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

So, what would happen if we did a simulation of “quality based” layoffs versus experience-based layoffs using the Houston data, where the quality-based layoffs were based on a VAM model using the Texas Assessments (TAKS), but then we evaluate the effectiveness of the layoff alternatives using a value-added model of Stanford achievement test data? Arguably the odds would still be stacked in favor of VAM predicting VAM – even if different VAM measures (and perhaps different model specifications). But, I suspect the results would be much less compelling than the original simulation.

The results under this alternative approach may, however, be reduced entirely to noise – meaning that the VAM based layoffs would be the equivalent of random firings – drawn from a hat and poorly if at all correlated with the outcome measure estimated by a different VAM – as opposed to experienced based firings. Neither would be a much better predictor of future value-added.  But for all their flaws, I’d take the experienced based dismissal policy over the roll of the dice, randomized firing policy any day.

In the case of the LA Times analysis, the situation is particularly disturbing if we look back on some of the findings in their own technical report.

I explained in a previous post that the LA Times value-added model had potentially significant bias in its estimates of teacher quality. For example, in my earlier post, I explain that:

Buddin finds that black teachers have lower value-added scores for both ELA and MATH. Further, these are some of the largest negative effects in the second level analysis – especially for MATH. The interpretation here (for parent readers of the LA Times web site) is that having a black teacher for math is worse than having a novice teacher. In fact, it’s the worst possible thing! Having a black teacher for ELA is comparable to having a novice teacher.

Buddin also finds that having more black students in your class is negatively associated with teacher’s value-added scores, but writes off the effect as small. Teachers of black students in LA are simply worse? There is NO discussion of the potentially significant overlap between black teachers, novice teachers and serving black students, concentrated in black schools (as addressed by Hanushek and Rivken in link above).

By contrast, Buddin finds that having an Asian teacher is much, much better for MATH. In fact, Asian teachers are as much better (than white teachers) for math as black teachers are worse! Parents – go find yourself an Asian math teacher in LA? Also, having more Asian students in your class is associated with higher teacher ratings for Math. That is, you’re a better math teacher if you’ve got more Asian students, and you’re a really good math teacher if you’re Asian and have more Asian students?????

One of the more intriguing arguments in the new LA Times article is that under the seniority based layoff policy:

Schools in some of the city’s poorest areas were disproportionately hurt by the layoffs. Nearly one in 10 teachers in South Los Angeles schools was laid off, nearly twice the rate in other areas. Sixteen schools lost at least a fourth of their teachers, all but one of them in South or Central Los Angeles.

That is, new teachers who were laid off based on seniority preferences were concentrated in high need schools. But so too were teachers with low value-added ratings?

While arguing that “far fewer” teachers would be laid off in high need schools under a quality-based layoff policy, the LA Times does not however offer up how many teachers would have been dismissed from these schools had their biased value-added measures been used instead? Recall that from the original LA Times analysis:

97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor.

Combine this finding with the findings above regarding the relationship between race and value-added ratings and it is difficult to conceive how VAM based layoffs of teachers in LA would not also fall disparately on high poverty and high minority schools. The disparate effect may be partially offset by statistical noise, but that simply means that some teachers in lower poverty schools will be dismissed on the basis of random statistical error, instead of race-correlated statistical bias (which leads to a higher rate of dismissals in higher poverty, higher minority schools).

Further, the seniority based layoff policy leads to more teachers being dismissed in high poverty schools because the district placed more novice teachers in high poverty schools, whereas the value-added based layoff policy would likely lead to more teachers being dismissed from high poverty, high minority schools, experienced or not, because they were placed in high poverty, high minority schools.

So, even though we might make a rational case that seniority based layoffs are not the best possible option, because they may not be highly correlated with true (not “true”) teaching quality, I fail to see how the current proposed alternatives are much if any better.  They only appear to be better when we measure them against themselves as the “true” measure of success.

The Curious Duplicity of NCTQ

NCTQ fashions itself as a leading think tank on promoting teacher quality in K-12 education. NCTQ adopts a relatively extreme position that teacher quality is it – the one and only thing that matters! Teacher quality is THE determining factor of school quality.

I also believe that teacher quality is very important. I also agree with NCTQ on the point that content knowledge, at the middle and secondary levels especially, is particularly important and that simply being listed as “qualified” to teach specific content is no guarantee.

As part of their effort to improve teacher quality, NCTQ has been going around doing “studies” and applying ratings to the quality of teacher preparation institutions. Now, I noted on my previous post that NCTQ and others may actually be missing the boat on who is actually preparing teachers. But lets set that aside for a moment. One would think that if NCTQ is so interested in teacher quality as the primary determinant of school quality and student success, and teacher expertise as an important part of that equation at higher grade levels, that any analysis of the quality of undergraduate or graduate programs to train teachers would have to place significant emphasis on faculty quality and expertise? right? It would make little sense to simply review which textbooks are used or what the course descriptions say, or what the curricular sequence happens to be? Right?

Out of a multitude of indicators on teacher preparation institutions, NCTQ includes only 1 – yes 1 – regarding faculty quality, which is described as follows:

In our evaluation of programs, we examined teaching responsibilities for all faculty members, as indicated by course assignments in course schedules, excluding all clinical coursework. We looked for two specific examples of inappropriate assignments: 1) an instructor teaching across the areas of foundations of education, methods and educational psychology; and/or 2) an instructor who teaches both reading and mathematics methods courses. Other inappropriate assignments may well be made but were not included in our review.

Yep, that’s it. All that they address is whether a faculty member appears to teach across two areas that no faculty member, in their view, could be sufficiently prepared to teach. The rest is based largely on textbooks chosen, syllabi and course descriptions, regardless of faculty expertise. Clearly this was a matter of data convenience. It’s hard to figure out whether individual faculty members truly possess expertise in their fields, short of evaluating their individual academic backgrounds, research and writing on the topic.

But it is absurd for an organization that believes teacher quality in K-12 education paramount, and content expertise critical, to ignore outright faculty expertise in their evaluations of teacher preparation institutions.

Here’s their FAQ on the long-term project of evaluating teacher preparation programs:

Related reading (actual research):

Wolf-Wendel, L, Baker, B.D., Twombly, S., Tollefson, N., & Mahlios, M. (2006) Who’s Teaching the Teachers? Evidence from the National Survey of Postsecondary Faculty and Survey of Earned
Doctorates. American Journal of Education 112 (2) 273-300

Ed Schools

Ed schools seem to make an easy target in public policy debates over the quality of American public schooling and the American teacher workforce.

In many recent lopsided “ed school as the root of all evil” presentations, “Ed Schools,” are treated as some easily defined, static entity over time. In the book of reformyness (chapter 7, verse 2), “Ed Schools” necessarily consist of some static set of traditional higher education institutions – 4 year teachers colleges including regional state colleges and flagship universities – where a bunch of crusty old education professors spew meaningless theory at wide-eyed undergrads (who graduated at the bottom of their high school class) seeking that golden ticket to a job for life – with summers off.

In order to craft a clearly understandable (albeit entirely false) dichotomy of policy alternatives, pundits then present teachers who have obtained alternative certification as a group of individuals, nearly all of whom necessarily attended highly selective colleges and majored in something really, really rigorous and then received their certification through some more expeditious and clearly much more practical and useful fast-tracked option.

This was certainly the theme of a discussion (hashtag #edschools) at Thomas B. Fordham Institute actively tweeted the other day by Mike Petrilli and a few others.  What I found most interesting was that no-one really challenged the assumptions that “ed schools” are some easily definable group of traditional higher education institutions – that this has been unchanged over decades – and that teacher training is some consistent, exclusive domain of traditional public higher education institutions – specifically as an undergraduate degree granting enterprise? That there are and have always been, oh… about a thousand or so ed schools… that well… keep on doing the same damn thing over and over again (for the past 50 years, one participant tweeted) … and well… no one ever shuts down the bad Ed Schools… and that’s why we’re in such bad shape! It’s really that simple.

Because this characterization is simply assumed to be true, the obvious way to crack this broken and declining system is to expand alt. certification and allow more non-traditional, for profit and entrepreneurial organizations – especially non-university organizations to grant teaching credentials – heck – let’s let them actually grant degrees. Who needs brick-and-mortar colleges anyway? Given the assumed static nature of the declining and antiquated system of “Ed Schools” that has brought us to our knees, this is the only answer!!!!!

One of my favorite tweets from the event was from Mike Petrilli, relaying a comment by Kate Walsh:

Walsh: There are 1410 Ed schools in the country. NCTQ spent 5 years determining that number.

You know what Kate, by the time you were done figuring that out (however you did), the number had already changed. Also, FYI, there are actually some data sources out there that might have been helpful for tabulating the existing degree granting programs and the numbers of degrees conferred by those programs.

So, let’s take a look at some of the data on degrees conferred across all education fields in 1990, 2000 and 2010.

Let’s start with a quick look at the total degrees conferred in “education” as defined by degree classification codes (CIP Codes), across all institutions granting such degrees nationally. The interesting twist here is that bachelor’s degree production of education degrees has been relatively constant over time for about 20 years and perhaps longer. Doctoral degree production increased from 1990 to 2000, but stagnated after that. On the other hand, Master’s degree production has skyrocketed.

Now, one might try to argue that what that’s really about is all of those currently practicing teachers who are just accumulating those worthless master’s degrees to get that salary bump. I will write more on this topic at a later point, but that’s not likely the dominant scenario. Yes, many of the master’s degrees are obtained to broaden fields of certification in order to give current teachers more options – either assignment options in their current districts, or other job opportunities. AND, many of the masters degrees these days are initial credentials granted to individuals who did not receive their teaching credential as an undergraduate. Many initial teaching credentials are granted at the master’s, not bachelor’s level. A substantial amount of teacher training goes on at the master’s, not undergraduate level. No matter the case, the master’s degrees – of which there are so many – and so many more being granted than bachelors degrees – are the interesting story here.

Is it really that the same old traditional higher education institutions with crusty old, out of date professors, are now just spewing out masters degrees? Or is something else at work here?

Well, here are the top 25 MA producers in education back in 199o. Even at that time, the largest master’s degree granting institutions were not the top universities – or even the top teachers colleges. But, some of those schools were at least in the mix. Teachers College of Columbia University, Ohio State, Michigan State and Harvard all appear in the top 25 in 1990.

Here are the top 25 master’s producers in 2000. Here, the tide begins to shift a bit. Schools like NOVA Southeastern with their online programs, and National-Louis grow even bigger than they had been a decade earlier. Teachers College retains a top 25 spot, as does Ohio State, and University of Minnesota makes the list. Harvard is gone.

By 2009, “Ed Schools” are a substantially different mix. Not only that, but look at the volume of degree production. Back in 1990, Ed Schools at respectable major universities were putting out about 600 master’s degrees in education related fields per year. They held on to similar rates in 2000 and still in 2009. But by 2009, Walden University and U. of Phoenix were each cranking out 4,500+ master’s degrees per year. Grand Canyon U. comes in next in line. These are the entrepreneurial up-starts that are the product of minimized regulation of teaching credentials.

If there truly has been a decline in the quality of the teacher workforce, and if pundits truly believe that this supposed decline is related somehow to “Ed Schools,” then it might behoove those same pundits to explore the dramatic changes that have, in fact, already occurred in the “Ed School” marketplace.

If there has been a dramatic decline in teacher preparation, and in specialized training, it may be worth taking a look at those institutions that have emerged to dominate the production of education degrees and credentials in recent years. After all, Walden and Phoenix each produce 5 to 10 times the master’s degree credentials in education of major public universities. And, production of education master’s degrees is now nearly double the level of production of education bachelor’s degrees. And many of these entrepreneurial start-ups specifically frame their master’s programs as an option for individuals with a bachelor’s degree in “something else” to obtain a teaching credential.

Is even more deregulation and entrepreneurial teacher preparation what we really need? Can one really blame the traditional higher education institutions, whose share of production has declined steadily for decades, for declining teacher quality? Only if you ignore these trends, which I expect these pundits will continue to do.


Truly Uncommon in Newark…

A while back I wrote a post explaining why I felt that while Robert Treat Academy Charter School in Newark is a fine school, it’s hardly a replicable model for large scale reform in Newark, or elsewhere.  I have continued over time to write about the extent to which Newark Charter schools in particular have engaged in a relatively extreme pattern of cream skimming.  The same is true in Jersey City and Hoboken, but not so in Trenton. But, Trenton also offers us fewer examples of those high-flying charters that we are supposed to view as models for the future of NJ education. When I wrote my earlier post on Treat, I somehow completely bypassed North Star Academy, which I would now argue is even that much less scalable than Robert Treat. That’s not to say that North Star Academy is not a highly successful school for the students that it serves… or at least for those who actually stay there over time.  But rather that Star of the North is yet another example of why the “best” New Jersey charter schools provide a very limited path forward for New Jersey urban school reform. Let’s take a look:

So, here’s where North Star fits in my 8th grade performance comparisons of beating the odds, based on the statistical model I explain in previous posts:

In this figure (ab0ve), we see that North Star certainly beats the odds at 8th grade. Now, we can also already see that North Star has a much lower % free lunch than nearly any other school in Newark, limiting scalability right off the bat. There just aren’t enough non-poor kids in Newark to create many more schools with demography like North Star. Not to mention the complete lack of children with disabilities or limited English language proficiency.

Here’s North Star on the map, in context. Smaller lighter circles are lower % free lunch schools. Most of the charters in this map are… well.. smaller lighter circles (with charters identified with a red asterisk). Not all, however, are as non-representative as North Star.

Now, here’s the part that sets North Star and a few others apart – at first in a seemingly good way…

If we take the 2009 assessments for each grade level, one interesting finding is that the charter schools serving lower grade levels in Newark are generally doing less well than the NPS average (red line). But, those schools that start at grade 5 seem to be picking up a population that right away is doing comparable or better than the NPS average. See, for example, TEAM and Greater Newark (comparable to NPS in their first grade – 5th – served) and, of course, North Star whose students perform well above NPS in their first year – likely not fully a North Star effect, but rather at least partly a selection effect (Lottery or not, it’s a different population than those served in the district).  More strikingly, with each increase in grade level, proficiency rates climb dramatically toward 100% by 8th grade. Either they are simply doing an amazing job of bringing these kids to standards over a 3 year period… or … well… something else.

The figure above looks at 6th, 7th, and 8th graders in the same year. That is, they aren’t the same kids over time doing  better and better. But, even if we looked at 6th graders in one year, 7th graders the next year and 8th graders the following year, we wouldn’t necessarily be looking at the same kids. In fact, one really easy way to make cohort test scores rise is to systematically shed – push out – those students who perform less well each year. Sadly, NJDOE does not provide the individual student data necessary for such tracking. But there are a few other ways to explore this possibility.

First, here are the cohort “attrition rates” based on 3 sequential cohorts for Newark Charter schools:

In this figure, we can see that for the 2009 8th graders, North Star began with 122 5th graders and ended with 101 in 8th. The subsequent cohort also began with 122, and ended with 104. These are sizable attrition rates. Robert Treat, on the other hand, maintains cohorts of about 50 students – non-representative cohorts indeed – but without the same degree of attrition as North Star. Now, a school could maintain cohort size even with attrition if that school were to fill vacant slots with newly lotteried-in students. This, however, is risky to the performance status of the school, if performance status is the main selling point.

Here’s what the cohort attrition looks like when tracked with the state assessment data.

Here, I take two 8th grade cohorts and trace them backwards. I focus on General Test Takers only, and use the ASK Math assessment data in this case. Quick note about those data – Scores across all schools tend to drop in 7th grade due to cut-score placement (not because kids get dumber in 7th grade and wise up again in 8th). The top section of the table looks at the failure rates and number of test takers for the 6th grade in 2005-06, 7th in 2006-07 and 8th in 2007-08. Over this time period, North Star drops 38% of its general test takers. And, cuts the already low failure rate from nearly 12% to 0%. Greater Newark also drops over 30% of test takers in the cohort, and reaps significant reductions in failures (partially proficient) in the process.

The bottom half of the table shows the next cohort in sequence. For this cohort, North Star sheds 21% of test takers between grade 6 and 8, and cuts failure rates nearly in half  – starting low to begin with (starting low in the previous grade level, 5th grade, the entry year for the school). Gray and Greater Newark also shed significant numbers of students and Greater Newark in particular sees significant reductions in share of non(uh… partially)proficient students.

My point here is not that these are bad schools, or that they are necessarily engaging in any particular immoral or unethical activity. But rather, that a significant portion of the apparent success of schools like North Star is a) attributable to the demographically different population they serve to begin with and b) attributable to the patterns of student attrition that occur within cohorts over time.

Again, the parent perspective and public policy perspective are entirely different. From a parent (or child) perspective, one is relatively unconcerned whether the positive school effect is function of selectivity of peer group and attrition, so long as there is a positive effect. But, from a public policy perspective, the model is only useful if the majority of positive effects are not due to peer group selectivity and attrition, but rather to the efficacy and transferability of the educational models, programs and strategies. Given the uncommon student populations served by many Newark charters and even more uncommon attrition patterns among some… not to mention the grossly insufficient data… we simply have no way of knowing whether these schools can provide insights for scalable reforms.

As they presently operate, however, many of the standout schools – with North Star as a shining example – do not represent scalable reforms.