Nonetheless, there is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed.
VAM estimates have proven to be unstable across statistical models, years, and
classes that teachers teach. One study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. Thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year.
The same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis. This runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time and raises questions about whether what is measured is largely
a “teacher effect” or the effect of a wide variety of other factors.
A study designed to test this question used VAM methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM
results are based on factors other than teachers’ actual effectiveness....
http://epi.3cdn.net/b9667271ee6c154195_t9m6iij8k.pdf