Punchback: Answering Critics

The Overselling of Growth Modeling

by Kevin G. Welner

The successor to No Child Left Behind remains to be shaped, but one change seems certain: School success will depend on whether students’ test scores increase, as opposed to just requiring scores above an adequate yearly progress threshold.

Growth modeling approaches appear to allow for this policy shift. And this would likely be an improvement over the AYP approach in the current NCLB. Yet like many new technologies, it’s being oversold.

NCLB largely refuses to take into account students’ starting points, so a school with students who begin with proficient test scores is likely to do well on subsequent tests. Such a school has a good shot at making AYP even if the school doesn’t do a great job teaching during the year.

In contrast, a school with students who arrive in September with poor scores on previous tests may provide excellent instruction and may even raise students’ scores considerably yet still have few students reach proficiency.

Potential Benefits
Growth modeling changes the question from “Was Mary’s score proficient?” to “Did Mary’s score increase?” It tries to quantify students’ change in performance and, in many cases, attribute those changes to particular teachers and schools.

The approach is rapidly gaining adherents. Growth models already are being used in some states and school districts for accountability or alternative teacher pay programs. Prominent proposals for NCLB renewal also include growth models to measure school success, and the Aspen Institute proposal for NCLB includes a “highly qualified effective teacher” provision grounded in a longitudinal form of growth modeling called value-added modeling, in which students’ increases are determined by comparing their latest test scores to those from the previous year or years.

In addition, under an NCLB pilot program announced in November 2005 by the U.S. Department of Education, states are using growth-to-standard models, whereby a school receives credit for nonproficient students who are “projected to be proficient” within the next three years.

Advocates for growth models point to multiple strengths and positive policy consequences. When each student serves as his or her own control, prior disadvantages affecting that student’s scores are largely factored out. A system based on each student’s improvement can reduce the incentive to focus on the “bubble kids” (those near the proficiency cut score) and the incentive for teachers to transfer to high-scoring schools in wealthier neighborhoods.

An effective growth model also could replace the arbitrary patchwork of state proficiency definitions. Perhaps most importantly, a growth system is simply more defensible because schools have more control over improvement than over a student’s absolute score.

Five Limitations
These models generally yield useful information. The problem is not with the technology; all technologies have limitations. The problem is with the overselling. Your new car may be a great way to travel to the nearest lake, but you shouldn’t try then to use it as a boat.

Yet if we are not forthright about limitations, we risk adopting policies that rely too much on the technology. Consider the following five limitations:

• Beware of cohort-comparison approaches, which are not true growth models. They include no longitudinal measure of individual student growth and therefore do not use each student at his or her own control. They cannot provide a true measure of individual growth.

• Growth expectations can be just as unrealistic as the current AYP expectations, as is evident from the demands set forth in the 2005 NCLB pilot program, requiring growth levels that move students to proficiency within three years. Such approaches retain several problems mentioned above: incentives for teachers to focus on a subset of students and to transfer to high-scoring schools, as well as the arbitrary patchwork of state proficiency definitions.

• Mobility of students, multiple teachers per student each year and untested subjects all introduce further confusion into the model, and there is no perfect way to adjust.

• Any growth model must be based on assumptions about the ongoing effects of a given teacher in subsequent years and about the ability of a prior year’s score to fully adjust for student, family and community resources as well as school and classroom resources. That is, such models tend to assume, probably incorrectly, that these contextual factors do not affect a student’s rate of progress over the measured year.

• The switch from a proficiency-threshold system to a growth model would not address core concerns about test-based accountability, such as narrowed curriculum, teaching to the test, measurement error and reliance on one type of assessment rather than multiple indicators.

To note these limitations is not to condemn growth models or even to argue these models would not be preferable. Policymakers, however, should keep such limitations in mind, never treating the technology as offering a truly objective or precise measure of school or teacher performance.

Growth models are simply not accurate enough to support their use as the sole or even the primary basis on which to make high-stakes decisions about teachers or schools.

Kevin Welner is associate professor of education policy and director of the Education and the Public Interest Center at the University of Colorado in Boulder, Colo. E-mail: welner@colorado.edu