teacher evaluation: issues of validity and reliability
DESCRIPTION
Assessment SRIG Biennial Meeting march 30, 2012 Na f me national conference 3:45pm-5:45pm Grand B timothy s. Brophy, Chair Kelly Parkes, incoming Chair. Teacher Evaluation: Issues of Validity and Reliability . Today’s Program. - PowerPoint PPT PresentationTRANSCRIPT
Teacher Evaluation: Issues of Validity and Reliability
ASSESSMENT SRIG BIENNIAL MEETING
MARCH 30, 2012 NAfME NATIONAL
CONFERENCE 3:45PM-5:45PM
GRAND B
TIMOTHY S. BROPHY, CHAIRKELLY PARKES, INCOMING
CHAIR
3:45pm. Greeting and Welcome; Election results. Timothy S. Brophy, Chair
3:55pm. Program begins: Teacher Evaluations – Issues of Validity and Reliabil ity Timothy S. Brophy and Richard Colwell. Teacher Evaluation: Issues of
Validity and Reliability. 4:20pm Dru Davison, Memphis City Schools. The Tennessee Fine Arts
Pilot: A Multiple Measures Portfolio System (Perform, Create, Respond, Connect) with Blind Peer Review. Electronic presentation.
4:40pm Keitha Lucas Hamann, U. Minnesota-Twin Cities, and Doug Orzolek, University of St. Thomas. Teacher Performance Assessment in Minnesota: Challenges for Music Educators.
5:05pm Breakout groups – Strategies for Measuring Student Growth in Music
5:30pm Leaders report 5:40pm Announcements of upcoming events. Closing remarks by
Kelly Parkes, Incoming Chair
TODAY’S PROGRAM
TEACHER EVALUATIONS:ISSUES OF VALIDITY AND
RELIABILITY
TIMOTHY S. BROPHY, UNIVERSITY OF FLORIDARICHARD COLWELL, PROFESSOR EMERITUS, UNIVERSITY OF
ILLINOIS
NAfME CONFERENCE ASSESSMENT SRIG MEETINGMARCH 30, 2012
The Context for The Reform of Teacher Evaluation
The Problem: Determining Music Teacher Effectiveness
Validity and Reliability Issues
Challenges to the SRIG
SESSION OVERVIEW
Achieving Equity in Teacher DistributionThe State will take actions to improve teacher
effectiveness and comply with section 1111(b)(8)(C) of the ESEA (20 U.S.C. 6311(b)(8)(C)) in order to address inequities in the distribution of highly qualified teachers between high- and low-poverty schools, and to ensure that low-income and minority children are not taught at higher rates than other children by inexperienced, unqualified, or out-of-field teachers. (H.R.1, p. 169)
THE POLITICAL CONTEXT:THE AMERICAN RECOVERY
AND REINVESTMENT ACT (2009)
RTTT Phase 2 defines teacher evaluation:States, LEAs, or schools must include multiple
measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (p. 19499)
THE POLITICAL CONTEXT:RACE TO THE TOP PHASE 2 -
CFDA NUMBER: 84.395A (2010)
Student achievement means:(b) For non-tested grades and subjects: alternative measures of student learning and performance such as student scores on pre-tests and end-of-course tests; student performance on English language proficiency assessments; and other measures of student achievement that are rigorous and comparable across classrooms. (p. 19500)
Student growth means the change in student achievement for an individual student between two or more points in time . A State may also include other measures that are rigorous and comparable across classrooms. (p. 19500)
Source: Federal Register/Vol. 75, No. 71/Wednesday, Apri l 14, 2010/ Notices
THE POLITICAL CONTEXT:RACE TO THE TOP PHASE 2
35-50% student
achievement
50-65% observations or other methods
Teacher evaluation
and “effectivene
ss” determinatio
n
THE NEW “EVALUATION EQUATION”
RTTT defines effective teachers in very specific terms.
We need to be able to know what it means for music teachers to be:“Effective” – when students achieve at
“acceptable rates” – at least one grade level in an academic year
“Highly effective” – when her/his students achieve at “high rates” – for example, 1.5
grade levels in an academic yearA BIG QUESTION:
What is a “year’s growth” in music education? How do we find out?
MUSIC TEACHER EFFECTIVENESS
THE “ELEPHANT IN THE LIVING ROOM” - GROWTH IN MUSIC
What do we need to measure “one grade level” of growth in music?
Rigorous, standards-
based grade level
music curriculum
on all standards
Clear, consistent grade-level expectation
s
Valid, reliable
assessments
Comparability across schools, districts,
and states
Student music learning = student achievement in RTTT
Assessment must be done well or not at all
NAEP is one reference for validity and reliability
NAfME continues to advocate for the arts as a core subject. Question: if music is a core subject, how do we define it? What is assessed?
The 2008 NAEP analysis omitted validity, reliability, item analysis, regressions, factor analysis and other test characteristics
NAEP analysis was concerned with demographic and SES related characteristics – race, gender, free and reduced lunch, community and school type, etc.
PART 1 OF THE EQUATION: VALID AND RELIABLE ASSESSMENTS OF STUDENT MUSIC LEARNING
Classroom Observati
on Principal
EvaluationInstructional Artifact
PortfolioTeacher
Self-Report
Student Survey
Value-Added Model
PART 2 OF THE EQUATION: “OTHER MEASURES” STRENGTHS AND CAUTIONS
Source: Goe, Holdheide, & Miller (2011). A practical guide to designing comprehensive teacher evaluation systems. National Comprehensive Center for Teacher Quality: Washington, DC.
To what extent do changes in a student’s performance reflect actual changes in his or her understanding of the underlying content?
When student test scores are used to estimate teaching effectiveness, what is the extent to which those estimates accurately represent the teacher’s contribution to student learning?
What evidence do we have regarding various threats to the validity of inferences for a particular use of a measure?
How do we attribute student performance to individual teachers when the assessments are intended to cover material from multiple courses?
Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation.
GENERAL VALIDITY ISSUES - USING STUDENT LEARNING MEASURES IN TEACHER
EVALUATIONS
Observations and Evaluative tools MUST be implemented by trained personnel who are content experts in music education
“Other measures” used MUST be valid for music teachers and account for the variables unique to music education
Student music achievement MUST be measured using valid, reliable instruments
Student achievement data used for music teacher evaluation MUST be from music assessments, not an arbitrary attribution of the effect of the
music teacher on scores for the “usual tested subjects” of math, reading, science, and writing
VALIDITY ISSUES FOR MUSIC TEACHER EVALUATION
Common approach: internal consistency reliability, which expresses the extent to which items on the test measure the same underlying construct
Measures of internal consistency reliability do not take into account interrater reliability in the scoring of any open-response items that tests may include, and they also do not measure the reliability of the value-added estimates themselves. Interrater reliability is an important consideration in the case of items that are assessed by human scorers because one wants to minimize the extent to which an individual’s score on the assessment is dependent on the idiosyncrasies of the rater who happens to score it.Reliability of value-added estimates is an important consideration because, due to random classroom- and student-level error, value-added estimates are known to be unstable from year to year.
Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation.
GENERAL RELIABILITY ISSUES
Clearly defining “open-ended”
responses in music – prepared
performance, on-demand
performance, composition,
improvisation, arrangement, etc.
Expert rubric development and training of scorers
Norming/calibration of rubrics used for
open-ended responses
Thorough item analysis for all item
types
RELIABILITY NEEDS FOR MUSIC TEACHER EVALUATION: STUDENT MUSIC ACHIEVEMENT
Readily available analysis techniques allow us to obtain sophisticated item analysis data for music itemsItem Response Theory models should become the standard analysis approach3 parameter models for dichotomous items which measure difficulty and discrimination while controlling for guessing Polytomous generalized rating scale models extend IRT theory to the analysis of rubric-based assessments (i.e. Samejima’s graded response model)Easy software programs such as XCalibre4™ make these complex calculations accessibleFrank Baker’s classic book, Basics of Item response theory, is now a free ERIC document
DEVELOPING ASSESSMENT RELIABILITY AND VALIDITY: ITEM ANALYSIS
Prince et al (2009) The Other 69 Percent: “Identifying highly effective teachers of subjects that are not tested with standardized achievement tests — such as teachers of art, mu sic, physical education, vocational education, and foreign languages — requires a different approach.” (p. 5)“It is easy to believe that we can assess whether students read well or solve math problems well or under stand social studies or science, but it is much more difficult to imagine how to assess whether students properly understand a subject such as art. Until we can agree on what constitutes effective teacher performance, it will be difficult to measure it and reward it.” (p. 6)
TEACHER EFFECTIVENESSA CALL FOR ACTION IN MUSIC EDUCATION
What is an effective music teacher?
What is a highly effective music teacher?
How do we measure music teacher effectiveness?
How do we evaluate music teacher effectiveness?
MUSIC TEACHER EFFECTIVENESS QUESTIONS FOR OUR PROFESSION
FIRST AND FOREMOST: We must lead the profession to develop technically sound, valid, reliable, assessments of student music learning in every state, that are thoroughly analyzed for validity, reliability, DIF, and item characteristics
A process or model of assessment development for states and districts
In cooperation with SMTE, collect and evaluate the validity and reliability of music teacher evaluation systems in NAfME states
Design and implement studies to develop empirically supported criteria for music teacher evaluation, use these to develop music teacher evaluation models, and assess their validity and reliability
CHALLENGE TO THE SRIG:EVALUATION RESEARCH NEEDS
THE “EVALUATION DILEMMA”
“Solutions to the evaluation dilemma are as complex as the issue itself. The evaluation of music teachers remains an area in need of relevant research, and the development of an appropriate evaluation and observation instrument must be urgently addressed. It is now the responsibility of the united music teaching profession, in tandem with active music education researchers, to address this challenge.”
Source: Brophy (1993)
Evaluation of music
educators: Toward
defining an appropriate instrument.
THANK YOU