student growth and value-added measures of effective teaching jonah rockoff columbia business school...

Download Student Growth and Value-added Measures of Effective Teaching Jonah Rockoff Columbia Business School May, 2013

If you can't read please download the document

Upload: alycia-godard

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Student Growth and Value-added Measures of Effective Teaching Jonah Rockoff Columbia Business School May, 2013 Slide 2 Can be an inputs based concept Observed actions (e.g. classroom observation) Can be outcomes based concept Student performance, parental satisfaction, etc. Student growth / value added measure performance based on student outcomes Agnostic about teaching practice What is Effective Teaching? 2 Slide 3 The Essence of the Problem Which teacher is more effective? Source: EngageNY Proficiency Slide 4 The Essence of the Problem (Part 2) Now which teacher is more effective?? Ms. Smith avg growth = 47 points Ms. Jones avg growth = 50 points Source: EngageNY (adapted) Proficiency Slide 5 5 Basics of Growth Analysis SG-VA analysis is a comparison of actual student growth to predicted growth Predicted growth is a benchmark or standard that varies across different students Once we have a benchmark/prediction for each child, the rest is just arithmetic: 1.Take actual growth and subtract benchmark 2.Average across students for each teacher Setting the benchmark is the big challenge What is a fair goal/standard for each child? Slide 6 ELA Scale Score 20112012 Student A 450 Similar Students as Benchmark Source: EngageNY (adapted) If we compare student As current score to other students who had the same prior score (450), we can measure her growth relative to similar students. We can describe her growth as a student growth percentile (SGP). If she performed better in the current year than 45% of similar students, then student A would be given SGP = 45 Slide 7 Student Growth vs. Value Added You say tomto, I say tomto Value added is really a catch-all phrase for measures of performance based on growth Student Growth Percentile (SGP) is slightly more specific, but can still be many things Key element is use of percentiles In practice, SGP as defined by NYSED uses student-level controls, not class- or school-level 7 Slide 8 8 The Importance of History The best predictor of student achievement growth is past achievement To set reasonably accurate benchmarks, just knowing demographics is not good enough This is why value-added research has focused on grades and subjects with annual testing Because of issues in test scaling, benchmarks usually estimated using very flexible methods Slide 9 9 The Use of Additional Controls Information besides prior achievement is useful in setting predictions for growth Does the student have a learning disability? Is the student living in poverty? Is the student an English Language Learner? What is the composition of the classroom or school (e.g., average peers characteristics)? Important to consider political and philosophical issues, not just statistical Should ethnicity be used to set benchmarks? NYSED SG NYSED VA Slide 10 10 Addressing Reliability Even if counterfactual expectations are perfect, achievement measures are imperfect Partial coverage of topics, allows guessing This means that some teachers will get (un)lucky Amount due to noise/luck depends on the amount of data available on the teacher Compare 1 class, 10 kids vs. 10 classes, 100 kids Researchers deal with this by adjusting estimates, by shrinking them towards mean performance Idea: if we know little about a teacher, our best guess is he/she is neither very high nor very low performing NYSED: use confidence intervals to determine HEDI, set minimum limits on amount of data needed for SG Slide 11 11 Basic Findings from VA Research Substantial variation in VA across teachers Difference between teachers at 75 th and 25 th percentiles is ~1/5 th of the racial test score gap A bit more variation in math than ELA Much of the variation is within schools VA estimates appear to contain real power to predict teacher effectiveness as measured by student achievement Enough stability across years to be useful Year to year reliability ranges from 0.3 to 0.5 Slide 12 How Predictive is Value Added? Teacher impacts on math in third year, separated by VA ranking after two years 12 Gordon, R., Kane, T., Staiger, D. (2006). Identifying effective teachers using performance on the job. Brookings Institution Hamilton Project Paper. Slide 13 Testing the Validity of Value-Added 13 Slide 14 Earlier Test on Random Assignment 14 Slide 15 Raising Scores Is Not The End Goal In a recent study, we use two techniques to address this question: Test #1, do students with high VA teachers also perform better in subsequent years? Test #2, do students with high VA teachers have improved outcomes as adults? 15 Slide 16 Lasting Gains from High VA Teachers 16 0 0.2 0.4 0.6 0.8 1 -4-3-201234 Year Impact of Current Teacher VA on Test Score Slide 17 Lasting Gains from High VA Teachers 17 Percent in College at Age 20 Teacher Value-Added 37% 37.5% 38% 38.5% -0.2-0.100.10.2 Slide 18 Lasting Gains from High VA Teachers 18 7.6% 7.7% 7.8% 7.9% 8.0% 8.1% Teacher Value-Added Percent of Females with Teenage Births -0.2-0.100.10.2 Slide 19 Lasting Gains from High VA Teachers 19 $20,400 $20,600 $20,800 $21,000 $21,200 Earnings at Age 28 Teacher Value-Added -0.2-0.100.10.2 Slide 20 -1% -0.5% 0% 0.5% 1% 1.5% 2022242628 Age High College AttendanceLow College Attendance Impact of 1 SD of VA on Earnings Impacts on Earnings Over Time, by School College-Attendance Rates Slide 21 How Big Are These Effects? What is the impact of having a top 5% VA teacher on present value of lifetime earnings for a class of average size (28 students)? Relative to having an average VA teacher Calculation requires a lot of assumptions: Constant percentage impact on earnings Life-cycle earnings follows cross-sectional life- cycle path as measured in 2010 2% wage growth, 5% discount rate to age 12 Undiscounted gains are roughly 5 times larger Result: $266,000 21 Slide 22 22 Caveats and Limitations First issue is conceptual: even with correct expectations and perfect tests, value-added measures are, by nature, context specific Teachers with high VA in their current jobs may not be as effective in another context VA not measurable in many grades/subjects Although that may be changing VA must be a relative performance metric Half the teachers will always be below average VA is essentially summative, not formative Slide 23 Value Added is Like a Batting Avg. 23 Stability in batting average ~ 0.4 Slide 24 Is VA No Longer Controversial? Randy Weingarten in 2008: There is no way that any of this current data could actually, fairly, honestly or with any integrity be used to isolate the contributions of an individual teacher. Randi Weingarten in 2013: The MET findings reinforce the importance of evaluating teachers based on a balance of multiple measures of teaching effectiveness, in contrast to the limitations of focusing on student test scores, value-added scores or any other single measure. 24 Slide 25 What About Those Other Measures? Measures like classroom observation scores can have useful features that SG-VA lacks Rely on specific and easily understood rubrics, rather than complicated formulae Provide wide coverage of many grades subjects that SG-VA misses Can have absolute performance measures, not just relative comparisons Can be timely and used for development May capture other dimensions of teaching unrelated to test score growth but valuable 25 Slide 26 Current Observation-Based Evaluations Do Not Differentiate Among Teachers 26 Weisberg, D., Sexton, S., Mulhern, J. & Keeling, D. (2009) The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness. New York: The New Teacher Project. Satisfactory (or equivalent) Unsatisfactory (or equivalent) Slide 27 Is Teaching Like Comedy & Baseball? 27 Slide 28 Gates MET Project 28 Slide 29 Chicago Effective Teaching Pilot 29 Slide 30 A Role for Input-Based Measures There is consistent evidence that subjective evaluations of teaching practice are strongly related to gains in student achievement Several rigorous validated observation rubrics (e.g., CLASS, Danielson FFT) Why not just identify effective teachers using classroom observations? Evaluations based solely on class observations will be neither 100% accurate nor 100% stable Do a few lessons = a whole year of teaching? Bias and preferential treatment are also possible 30 Slide 31 31 Multiple Measures as Complements Value added and classroom observations are both useful but incomplete measures Same can be said of other measures of teacher effectiveness (e.g., student work, lesson plans) Thus, a more reliable way to form an evaluation is to base it on multiple measures Evaluation based on best practice and research If a teacher performs well or poorly on several measures, chances are much greater that they will continue to do so in the future Slide 32 In Conclusion Value added analysis is a useful tool for measuring the important contributions that teachers make to student growth Like classroom observation, it is incomplete, but a valuable asset in the evaluation process 32 Slide 33 Thank You! 33