teacher evaluation: issues of validity and reliability

Teacher Evaluation: Issues of Validity and Reliability

ASSESSMENT SRIG BIENNIAL MEETING

MARCH 30, 2012 NAfME NATIONAL

CONFERENCE 3:45PM-5:45PM

GRAND B

TIMOTHY S. BROPHY, CHAIRKELLY PARKES, INCOMING

CHAIR

3:45pm. Greeting and Welcome; Election results. Timothy S. Brophy, Chair

3:55pm. Program begins: Teacher Evaluations – Issues of Validity and Reliabil ity Timothy S. Brophy and Richard Colwell. Teacher Evaluation: Issues of

Validity and Reliability. 4:20pm Dru Davison, Memphis City Schools. The Tennessee Fine Arts

Pilot: A Multiple Measures Portfolio System (Perform, Create, Respond, Connect) with Blind Peer Review. Electronic presentation.

4:40pm Keitha Lucas Hamann, U. Minnesota-Twin Cities, and Doug Orzolek, University of St. Thomas. Teacher Performance Assessment in Minnesota: Challenges for Music Educators.

5:05pm Breakout groups – Strategies for Measuring Student Growth in Music

5:30pm Leaders report 5:40pm Announcements of upcoming events. Closing remarks by

Kelly Parkes, Incoming Chair

TODAY’S PROGRAM

TEACHER EVALUATIONS:ISSUES OF VALIDITY AND

RELIABILITY

TIMOTHY S. BROPHY, UNIVERSITY OF FLORIDARICHARD COLWELL, PROFESSOR EMERITUS, UNIVERSITY OF

ILLINOIS

NAfME CONFERENCE ASSESSMENT SRIG MEETINGMARCH 30, 2012

The Context for The Reform of Teacher Evaluation

The Problem: Determining Music Teacher Effectiveness

Validity and Reliability Issues

Challenges to the SRIG

SESSION OVERVIEW

Achieving Equity in Teacher DistributionThe State will take actions to improve teacher

effectiveness and comply with section 1111(b)(8)(C) of the ESEA (20 U.S.C. 6311(b)(8)(C)) in order to address inequities in the distribution of highly qualified teachers between high- and low-poverty schools, and to ensure that low-income and minority children are not taught at higher rates than other children by inexperienced, unqualified, or out-of-field teachers. (H.R.1, p. 169)

THE POLITICAL CONTEXT:THE AMERICAN RECOVERY

AND REINVESTMENT ACT (2009)

RTTT Phase 2 defines teacher evaluation:States, LEAs, or schools must include multiple

measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (p. 19499)

THE POLITICAL CONTEXT:RACE TO THE TOP PHASE 2 -

CFDA NUMBER: 84.395A (2010)

Student achievement means:(b) For non-tested grades and subjects: alternative measures of student learning and performance such as student scores on pre-tests and end-of-course tests; student performance on English language proficiency assessments; and other measures of student achievement that are rigorous and comparable across classrooms. (p. 19500)

Student growth means the change in student achievement for an individual student between two or more points in time . A State may also include other measures that are rigorous and comparable across classrooms. (p. 19500)

Source: Federal Register/Vol. 75, No. 71/Wednesday, Apri l 14, 2010/ Notices

THE POLITICAL CONTEXT:RACE TO THE TOP PHASE 2

35-50% student

achievement

50-65% observations or other methods

Teacher evaluation

and “effectivene

ss” determinatio

n

THE NEW “EVALUATION EQUATION”

RTTT defines effective teachers in very specific terms.

We need to be able to know what it means for music teachers to be:“Effective” – when students achieve at

“acceptable rates” – at least one grade level in an academic year

“Highly effective” – when her/his students achieve at “high rates” – for example, 1.5

grade levels in an academic yearA BIG QUESTION:

What is a “year’s growth” in music education? How do we find out?

MUSIC TEACHER EFFECTIVENESS

THE “ELEPHANT IN THE LIVING ROOM” - GROWTH IN MUSIC

What do we need to measure “one grade level” of growth in music?

Rigorous, standards-

based grade level

music curriculum

on all standards

Clear, consistent grade-level expectation

s

Valid, reliable

assessments

Comparability across schools, districts,

and states

Student music learning = student achievement in RTTT

Assessment must be done well or not at all

NAEP is one reference for validity and reliability

NAfME continues to advocate for the arts as a core subject. Question: if music is a core subject, how do we define it? What is assessed?

The 2008 NAEP analysis omitted validity, reliability, item analysis, regressions, factor analysis and other test characteristics

NAEP analysis was concerned with demographic and SES related characteristics – race, gender, free and reduced lunch, community and school type, etc.

PART 1 OF THE EQUATION: VALID AND RELIABLE ASSESSMENTS OF STUDENT MUSIC LEARNING

Classroom Observati

on Principal

EvaluationInstructional Artifact

PortfolioTeacher

Self-Report

Student Survey

Value-Added Model

PART 2 OF THE EQUATION: “OTHER MEASURES” STRENGTHS AND CAUTIONS

Source: Goe, Holdheide, & Miller (2011). A practical guide to designing comprehensive teacher evaluation systems. National Comprehensive Center for Teacher Quality: Washington, DC.

To what extent do changes in a student’s performance reflect actual changes in his or her understanding of the underlying content?

When student test scores are used to estimate teaching effectiveness, what is the extent to which those estimates accurately represent the teacher’s contribution to student learning?

What evidence do we have regarding various threats to the validity of inferences for a particular use of a measure?

How do we attribute student performance to individual teachers when the assessments are intended to cover material from multiple courses?

Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation.

GENERAL VALIDITY ISSUES - USING STUDENT LEARNING MEASURES IN TEACHER

EVALUATIONS

Observations and Evaluative tools MUST be implemented by trained personnel who are content experts in music education

“Other measures” used MUST be valid for music teachers and account for the variables unique to music education

Student music achievement MUST be measured using valid, reliable instruments

Student achievement data used for music teacher evaluation MUST be from music assessments, not an arbitrary attribution of the effect of the

music teacher on scores for the “usual tested subjects” of math, reading, science, and writing

VALIDITY ISSUES FOR MUSIC TEACHER EVALUATION

Common approach: internal consistency reliability, which expresses the extent to which items on the test measure the same underlying construct

Measures of internal consistency reliability do not take into account interrater reliability in the scoring of any open-response items that tests may include, and they also do not measure the reliability of the value-added estimates themselves. Interrater reliability is an important consideration in the case of items that are assessed by human scorers because one wants to minimize the extent to which an individual’s score on the assessment is dependent on the idiosyncrasies of the rater who happens to score it.Reliability of value-added estimates is an important consideration because, due to random classroom- and student-level error, value-added estimates are known to be unstable from year to year.

Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation.

GENERAL RELIABILITY ISSUES

Clearly defining “open-ended”

responses in music – prepared

performance, on-demand

performance, composition,

improvisation, arrangement, etc.

Expert rubric development and training of scorers

Norming/calibration of rubrics used for

open-ended responses

Thorough item analysis for all item

types

RELIABILITY NEEDS FOR MUSIC TEACHER EVALUATION: STUDENT MUSIC ACHIEVEMENT

Readily available analysis techniques allow us to obtain sophisticated item analysis data for music itemsItem Response Theory models should become the standard analysis approach3 parameter models for dichotomous items which measure difficulty and discrimination while controlling for guessing Polytomous generalized rating scale models extend IRT theory to the analysis of rubric-based assessments (i.e. Samejima’s graded response model)Easy software programs such as XCalibre4™ make these complex calculations accessibleFrank Baker’s classic book, Basics of Item response theory, is now a free ERIC document

DEVELOPING ASSESSMENT RELIABILITY AND VALIDITY: ITEM ANALYSIS

Prince et al (2009) The Other 69 Percent: “Identifying highly effective teachers of subjects that are not tested with standardized achievement tests — such as teachers of art, mu sic, physical education, vocational education, and foreign languages — requires a different approach.” (p. 5)“It is easy to believe that we can assess whether students read well or solve math problems well or under stand social studies or science, but it is much more difficult to imagine how to assess whether students properly understand a subject such as art. Until we can agree on what constitutes effective teacher performance, it will be difficult to measure it and reward it.” (p. 6)

TEACHER EFFECTIVENESSA CALL FOR ACTION IN MUSIC EDUCATION

What is an effective music teacher?

What is a highly effective music teacher?

How do we measure music teacher effectiveness?

How do we evaluate music teacher effectiveness?

MUSIC TEACHER EFFECTIVENESS QUESTIONS FOR OUR PROFESSION

FIRST AND FOREMOST: We must lead the profession to develop technically sound, valid, reliable, assessments of student music learning in every state, that are thoroughly analyzed for validity, reliability, DIF, and item characteristics

A process or model of assessment development for states and districts

In cooperation with SMTE, collect and evaluate the validity and reliability of music teacher evaluation systems in NAfME states

Design and implement studies to develop empirically supported criteria for music teacher evaluation, use these to develop music teacher evaluation models, and assess their validity and reliability

CHALLENGE TO THE SRIG:EVALUATION RESEARCH NEEDS

THE “EVALUATION DILEMMA”

“Solutions to the evaluation dilemma are as complex as the issue itself. The evaluation of music teachers remains an area in need of relevant research, and the development of an appropriate evaluation and observation instrument must be urgently addressed. It is now the responsibility of the united music teaching profession, in tandem with active music education researchers, to address this challenge.”

Source: Brophy (1993)

Evaluation of music

educators: Toward

defining an appropriate instrument.

THANK YOU

teacher evaluation: issues of validity and reliability

Documents