I initially embraced value-added methods of teacher evaluation, figuring that they could revolutionize education in the same way that sabermetricians revolutionized professional baseball. Over time, however, I realized that this analogy was somewhat flawed. There are lots of ways to analyze data, and the owners of baseball teams have a real motivation — they want to win ball games and sell tickets — to use data appropriately to ensure their best chance of success. I’m not so sure that the “owners” of public education — the politicians and ultimately the voters — share this motivation.
An excellent editorial the contrasting use of statistics in baseball and in education appeared in Education Week: http://www.edweek.org/tm/articles/2014/08/27/fp_eger_valueadded.html?cmp=ENL-TU-NEWS1 I appreciate the tack that this editorial takes: the author is not philosophically opposed to sabermetric-like analysis of education but argues forcefully that, pragmatically, we’re not there yet.
Both the Gates Foundation and the Education Department have been advocates of using value-added models to gauge teacher performance, but my sense is that they are increasingly nervous about accuracy and fairness of the new methodology, especially as schools transition to the Common Core State Standards.
There are definitely grounds for apprehensiveness. Oddly enough, many of the reasons that the similarly structured WAR [Wins Above Replacement] works in baseball point to reasons why teachers should be skeptical of value-added models.
WAR works because baseball is standardized. All major league baseball players play on the same field, against the same competition with the same rules, and with a sizable sample (162 games). Meanwhile, public schools aren’t playing a codified game. They’re playing Calvinball—the only permanent rule seems to be that you can’t play it the same way twice. Within the same school some teachers have SmartBoards while others use blackboards; some have spacious classrooms, while others are in overcrowded closets; some buy their own supplies while others are given all they need. The differences across schools and districts are even larger.
The American Statistical Association released a brief report on value-added assessment that was devastating to its advocates. ASA set out some caveats on the usage on value-added measurement (VAM) which should give education reformers pause. Some quotes:
VAMs are complicated statistical models, and they require high levels of statistical expertise. Sound statistical practices need to be used when developing and interpreting them, especially when they are part of a high-stakes accountability system. These practices include evaluating model assumptions, checking how well the model fitsthe data, investigating sensitivity of estimates to aspects of the model, reporting measures of estimated precision such as confidence intervals or standard errors, and assessing the usefulness of the models for answering the desired questions about teacher effectiveness and how to improve the educational system.
VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.
VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.