issues in teacher evaluation and validity: conceptual, methodological, and practical

50
1/27 University of California, Los Angeles Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical Jose Felipe Martinez University of California, Los Angeles Graduate School of Education New Mexico Teacher Evaluation Advisory Council (NMTEACH) New Mexico Public Education Department UCLA Graduate School of Education & Information Studies

Upload: ornice

Post on 15-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical. UCLA Graduate School of Education & Information Studies. Jose Felipe Martinez University of California, Los Angeles Graduate School of Education. New Mexico Teacher Evaluation Advisory Council (NMTEACH) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

1/27University of California, Los Angeles

Issues in Teacher Evaluation and Validity:Conceptual, Methodological, and Practical

Jose Felipe Martinez

University of California, Los AngelesGraduate School of Education

New Mexico Teacher Evaluation Advisory Council (NMTEACH)

New Mexico Public Education Department

UCLA Graduate School of Education & Information Studies

Page 2: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

2/27University of California, Los Angeles

Overview

• Teacher Evaluation: The Policy Context

• Teacher Evaluation• Conceptual/Methodological Issues: Why, What, How

• Constructs and methods

• Teacher Evaluation with Multiple Measures• Multiple Measures and Validity

• Models for combining indicators

• Validation Frameworks and Sources of Evidence

• Consequences, additional issues

When pasting text from another document, do the following:1. Highlight the text you want to replace2. Go to the EDIT menu and select PASTE SPECIAL3. Select “Paste as: UNFORMATTED TEXT”

Page 3: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

3/27University of California, Los Angeles

Teacher Evaluation:

The Policy Context

Page 4: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

4/27University of California, Los Angeles

Teacher Evaluation: A New Silver Bullet?

• Teacher evaluation systems undergoing reform

• Tied to perceptions of performance in national or international evaluations,• Reverse Lake Wobegon; all below avg. (Feuer, 2012)

• …assumptions about the role of “good/bad” teachers in explaining/improving the results and

• …about our ability to identify these teachers

• Related to perceptions of teaching profession

• …quality of existing teacher evaluation systems

Page 5: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

5/27University of California, Los Angeles

Many Prominent examples

• United States• Los Angeles, New York, Chicago (2012)

• Denver (2010)

• Tennesee (1992, 2012)

• Toledo, Cincinnati (1990’s)

• Worldwide• Singapore (2006)

• Chile (2003)

• Mexico (1993,2009)

• Australia (2013)

Page 6: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

6/27University of California, Los Angeles

Teacher Evaluation:

Conceptual/Methodological Issues

Page 7: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

7/27University of California, Los Angeles

Why Evaluate?

• Motivations, inferences and uses• Identify struggling teachers to help them improve

• Identify recurrent struggling teachers for sanction

• Provide incentives to the best teachers

• Inform school practice/district policies on Teacher Preparation and Professional Development

• Identify and scale effective teacher practice

• Or typically a combination... (e.g. NMTEACH)

Page 8: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

8/27University of California, Los Angeles

Teacher Evaluation

Conceptual/Methodological Issues:

Why, What, How

Page 9: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

9/27University of California, Los Angeles

What to Evaluate?

• Teacher competence (Reynolds, 1999): • Knowledge: Subject, Pedagogical

• Skill: Ability, applied knowledge

• Disposition: Attitudes, Perceptions, Beliefs

• Practice: Classroom processes (e.g. instruction, assessment, management)

• And..• Seniority, Credentials

• School citizenship, contributions to community…

• “Effectiveness”: Ability to raise student test scores

Page 10: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

10/27University of California, Los Angeles

What to Evaluate? All of the above?

• “We fully understand that standardized tests don't capture all of the subtle qualities of successful teaching. That's why we call for multiple measures in evaluating teachers. In an ideal world, that data should also drive instruction and drive useful professional development.“

Arne Duncan

U.S. Secretary of Education

Page 11: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

11/27University of California, Los Angeles

Teacher Constructs (What?)

Measures (How?)

Knowledge (subject, pedagogical) Skills (ability, applied knowledge)

Multiple Choice TestsPerformance AssessmentsVignettes

Practice, Classroom Performance (instruction, assessment,

management)

Surveys, LogsClassroom Observations, VideoArtifacts, Portfolios

Disposition (beliefs, attitudes) Survey, Interview

Citizenship (contributions to community)

Surveys, Interview, Self Assessment

Effectiveness (contribution to student achievement)

Student Test Score Gains; “Value Added”

How to Evaluate?

(Reynolds, 1999)

Page 12: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

12/27University of California, Los Angeles

Which is Best? Which should we use?

• No method is inherently preferable

• Each illuminates a different aspect of Teacher [insert euphemism here]. • Different kind of information from different sources

• Pros and cons in reliability, validity, credibility…

• Here I will briefly discuss:• Value Added Models

• Observations

• Surveys

• Portfolios

Page 13: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

13/27University of California, Los Angeles

Value Added Models

• Culture changing towards using student achievement to evaluate teachers

• Simple Logic:• Students do better (grow) more in some classrooms

(Weisberg et al. 2009; Kane et.al. 2011)

• Student learning should be a (the?) key criterion to evaluate teacher quality

• Seemingly Simple Method:• With longitudinal data…compare teachers on the

progress of their students, not their achievement.

• Estimate teacher unique contributions to student academic growth, net of factors outside teacher control

Page 14: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

14/27University of California, Los Angeles

Value Added Models

• A family of statistical models• e.g. TVAAS, Growth percentiles, (variable) Persistence

• Correlated; measures used + important (Lockwood et.al 2007)

• A variety of issues:• Partial view of student learning (Baker et. al. 2010)

• Unstable estimates (Schochet & Chiang; 2010)

• Descriptive, not causal (Stuart, Rubin,Zanutto,2004), nor explanatory/diagnostic (Goe, 2011)

• Available only for some teachers (30-40% US)

• “…VAM estimates best used in combination with other indicators” (Braun et al., 2010)

Page 15: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

15/27University of California, Los Angeles

Classroom Observations

• Widely used to assess quality teaching practice• Explanatory + Formative counterpart to VAM

• Identify areas in need of improvement Inform PD

• Expensive if standardized (training, time)

• Error from complex rubrics, human judgment• Bias/Subjectivity in construct definition/emphasis

• Lower reliability than traditional instruments (live or video)

• Weak correlations with other indicators including student achievement (Kane et al. 2010)

Page 16: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

16/27University of California, Los Angeles

Classroom Observation: Constructs

Page 17: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

17/27University of California, Los Angeles

Classroom Observation: Reliability

(Source: Bill and Melinda Gates Foundation, 2011)

Page 18: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

18/27University of California, Los Angeles

Classroom Observation: Reliability

(Source: Bill and Melinda Gates Foundation, 2011)

Page 19: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

19/27University of California, Los Angeles

Teacher Surveys

• Common method for collecting data on teacher (classroom) practice on a large scale• Good coverage; Low cost; low burden for teachers

• Adequate reliability

• Questionable Validity• Error from inconsistency in interpretation of questions

• …and social desirability

• e.g. Emphasis on higher order thinking

• Weak correlations with other indicators including student achievement (Kane et al. 2010)

Page 20: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

20/27University of California, Los Angeles

Student Surveys

• Increasingly popular for teacher evaluation• Coverage; cost; perceived validity

• Adequate reliability aggregated by classroom• Correlated w/student achievement as much or more

than teacher surveys (Kane etal. 2010)

• Additional information at the student level• Variance reflects differentiated teacher practice with

different students (Martínez, 2012; Muthen , 1995)

• Correlated w/achievement also within classrooms

Page 21: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

21/27University of California, Los Angeles

Student Surveys: Remaining Issues

• Memory errors, inconsistency in interpretation• Particularly with younger children

• Concerns for high stakes teacher evaluation• Social desirability, pressure, other validity issues

• Cost Issues

• Unit of measurement, construct invariance• “My teacher asks me to read books”

• vs. “Our teacher asks us to read books”

Page 22: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

22/27University of California, Los Angeles

Student Surveys: Correlation to Achvmt

Page 23: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

23/27University of California, Los Angeles

Teacher Portfolios

What’s in a Teacher Portfolio?

Classroom Artifacts(lesson plans, assignments, samples of student work, etc.)

Teacher Reflections(on practice reflected in artifacts)

Student/Teacher Survey/Log(classroom practice, attitudes, perceptions)

vs. Surveys + Richer, Better Validity, PD value- Higher cost, Rater/Rubric Error, Burden on teachers

vs. Observations Debate taking form

• Compile evidence of teacher practice over a period of time

Page 24: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

24/27University of California, Los Angeles

Portfolios vs. Observations

• 1. Cost to Collect & Score?• Similar or lower than observations

• 2. Score Reliability?• Similar to observations/video (see MET study)

• May need to re-examine ideas of “acceptable reliability”

• Better coverage, validity x/some aspects of practice• Interesting possibilities with newer technologies

• 3. More burdensome for teachers?• Yes, much more so (20-30+ hour effort)• But, with burden comes Professional Development

• So far used mostly for “National Certification”• Growing interest? : EdTPA, PACT

• May be feasible as integral to an evaluation/PD cycle

Page 25: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

25/27University of California, Los Angeles

Teahcer Evaluation and

Multiple Measures

(Validity)

Page 26: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

26/27University of California, Los Angeles

Validity

• How do we know we are doing a good job of evaluating teachers? • Are our inferences and decisions valid?

“An integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or others modes of assessment.”

Messick (1989)

Page 27: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

27/27University of California, Los Angeles

• “In educational settings, a decision or characterization that will have major impact [on a student] should not be made on the basis of a single score. Other relevant information should be taken into account if it will enhance the overall validity of the decision.”

Standards for Educational and Psychological Testing, Standard 13.7 (AERA, APA, & NCME, 1999)

Page 28: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

28/27University of California, Los Angeles

What to Evaluate? All of the above

• New Mexico’s teacher evaluation system should utilize a matrix in which multiple components of a teacher’s evaluation combine to determine a teacher’s overall effectiveness rating.

• Effectiveness levels should only be assigned after careful consideration of multiple measures, including student achievement data, observations, and other proven measures [emphasis added]

New Mexico Effective Teaching Task Force

Page 29: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

29/27University of California, Los Angeles

Multiple measures: Logic and Assumptions

1.Accuracy

2.Validity

3.Feedback

4.Relevance

-Teachers classified into finer, more stable categories (De Pascale, 2012; Steele et. al. 2010)

- More complete picture of performance (Goe, 2011)-Less incentive for test preparation (Steele et. al. 2010)

- Information to help teachers adjust and improve instruction and classroom strategies. (Duncan, 2011)

- Greater confidence in results of evaluation among the public and stakeholders (Glazerman et. al. 2011)

• General Assumption: • Combining multiple measures leads to better informed

(more valid) decisions about teachers and teaching

Page 30: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

30/27University of California, Los Angeles

Combining Multiple measures: Conceptual Issues

• When/were does these assumptions hold?, in what situations? Depends on several factors

• Assumptions about nature of constructs involved

• Intended inferences and uses

• What is meant exactly by combining (Brookhart, 2009)

• Not self-explanatory. A variety of models is available

• Substantial literature in psychology, personnel evaluation, and student assessment.

• Only starting to be applied to Teacher Evaluation

Page 31: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

31/27University of California, Los Angeles

Models for Combining Multiple Measures

Model Description

Conjunctive Must meet criteria (pass) for all measures

Disjunctive Must meet criteria (pass) for k measures

Compensatory Based on composite measures. High level in one measure compensates for low levels in others

Hybrid e.g. Compensatory-conjunctive, Sequential

(Mehrens, 1989; Chester, 2003)

Page 32: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

32/27University of California, Los Angeles

Combination Model 0: Do not Combine!

• May consider not combining the indicators !• Summary indices not essential to formative or

summative evaluation

• Key measures may be collected, maintained, and reported separately

• All used to illuminate a side of the picture (improve teaching, communication, citizenship, achievmt?)

• And used jointly as needed where summative judgments are sought (Mehrens 1989; Brookhart 2009)

• Making combined use of multiple indicators ≠Combining multiple indicators

Page 33: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

33/27University of California, Los Angeles

Combination Model 1: Conjuntive, Disjunctive

33

PortfolioClassroom Observation

Other Indicators

Student Survey

Teacher Test

Student Achievemt.

Page 34: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

34/27University of California, Los Angeles

Decision Rules and Reliability

• Error in Multiple Measures may cancel out or compound• Assume Teacher A True Scores in T1, T2 are passes

• Because of unreliability the probability of pass Observed Scores is estimated at 0.80 and 0.90, respectively

• Probability of pass scores in both tests (Conjunctive Model): 0.8*0.9=0.72

• Probability of pass scores in either test (Disjunctive Model): 1-[0.2*0.1]=0.98

(see e.g. Cronbach, Linn, Brennan, & Haertel, 1997; Douglas and Mislevy, 2010)

Page 35: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

35/27University of California, Los Angeles

Decision Rules and Reliability

• Simplistic scenario. Complex rules often used in practice according to policy context and goals

• E.g.: Teachers must pass Measure 1 or 2, AND not rank lowest in Measure 3 (eg. New Haven)

• Choice of decision rule more important for accuracy and validity than the reliability of the component measures chosen (Chester, 2003)

• Importantly: Models are not “objective”; each involves judgment

• Why satisfy k criteria, not k-1? Why those criteria?

Page 36: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

36/27University of California, Los Angeles

Hybrid system : e.g. New Haven• Synthesizes three component measures (each

on 5-pt. scale):• Teacher instructional practice

• Teacher professional values

• Student learning outcomes

Page 37: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

37/27University of California, Los Angeles

Combination Model 2 (Compensatory): Principal Components / Factor Analysis

37

Portfolio

Student/Parent Survey

ClassroomObservation

TeacherSurvey

OtherMeasures Student

achievement

GlobalConstruct

Page 38: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

38/27University of California, Los Angeles

Combination Model 3 (Compensatory): Optimal Weight (Achievement as Criterion)

38

Artifacts/Portfolio

Student/Parent Survey

Student Achievement

ClassroomObservation

TeacherSurvey

OtherMeasures

Teacher Construct

Page 39: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

39/27University of California, Los Angeles

Combination Model 3 (Compensatory): Optimal Weight (Achievement as Criterion)

39

Artifacts/Portfolio

Student/Parent Survey

Student Achievement

ClassroomObservation

TeacherSurvey

OtherMeasures

Teacher Construct

β

ββ

β

β

Page 40: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

40/27University of California, Los Angeles

MM Combination Model 4 (Compensatory):PC/FA: Student achievement as Indicator

40

Artifacts/Portfolio

Student/Parent Survey

ClassroomObservation

TeacherSurvey

OtherMeasures

Student Achievement

Teacher Construct

Page 41: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

41/27University of California, Los Angeles

MM Combination Model 5 (Compensatory):

SEM/Canonical Correlates41

Artifacts/Portfolio

Student/Parent Survey

ClassroomObservation

TeacherSurvey

OtherMeasures

Student Measure #2

Other (e.g. non- cognitive)

Student Measure #1

Teacher Construct

Student Outcomes

Page 42: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

42/27University of California, Los Angeles

MM Combination Model 6 : (Darlington, 1970) Unmeasured Criterion, theoretical weights

42

Artifacts/Portfolio

Student/Parent Survey

ClassroomObservation

TeacherSurvey

OtherMeasures

Student Achievement

Unmeasured Teacher Construct

Page 43: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

43/27University of California, Los Angeles

Empirical vs. Theoretical Weighting

• Model 6 is most likely scenario in practice• Policy assumptions/values (consensual) inform the

system, alongside technical considerations

• It really is the only feasible scenario

• Empirical weights cannot be derived• Ultimate criterion measure is NOT available

• Note model 3 assumes such measure is available

• But does not give “correct” weight for criterion

• Exposure to Validity shrinkage (weight change over time)

Page 44: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

44/27University of California, Los Angeles

Multiple Measures and Validity

• Models may lead to different inferences. • Little guidance available; so…

• LOCAL VALIDITY STUDIES NEEDED (lots of them)

• As with single measures, need to set up testable validation hypotheses (Kane, 2006)

• Whatever the construct : Teacher [euphemism]

• 1. Describe intended inferences, uses, AND CONSEQUENCES

• 2. Collect empirical evidence to support

• 2012, 2013 MET reports will be influential. May force field to broaden our lens and revise assumptions and expectations

• No getting around conducting local validation studies

Page 45: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

45/27University of California, Los Angeles

What KINDS of EVIDENCE?

• All of them: Validity is a unitary notion• Theoretical support

• Consistency and accuracy (Reliability)

• Correlations, Internal structure

• Predictive power

• Consequences of use

• Validity becomes a rather empty academic topic if the consequences are not considered

• Or if they differ markedly from expectation

Page 46: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

46/27University of California, Los Angeles

What consequences?

• Intended and Unintended Effects • On teaching practice

• On different student outcomes

• On recruitment and retention

• On Motivation, Competition, Fraud

• On Perceptions of validity, fairness, utility

• On dynamic of relationships with parents and community

• Etc etc

Page 47: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

47/27University of California, Los Angeles

Final Remarks. Teacher Evaluation: Why are we doing this again?

• Some good reasons• Make student achievement priority

• Monitor & assess teacher performance

• Develop a culture of accountability

• and of reflection and improvement

• Inform PD to improve teacher performance

• However • Multiple fallible indicators do not automatically

yield better, less fallible inferences. But they always yield more complex ones

• Using indicators in combination involves technical but also conceptual and policy assumptions

Page 48: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

48/27University of California, Los Angeles

Final Remarks. Teacher Evaluation: Why are we doing this again?

• Because “the stakes are high, and the future of our children is at stake” (insert public official name here, circa 2012) we should proceed carefully and deliberately.

• Good measures take time to develop.

• Solid systems based on these measures take longer to test and implement.

• The consequences of implementing these systems are unknown and will take longer to assess.

• Experience suggests moving too fast to implement may shortchange the system

Page 49: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

49/27University of California, Los Angeles

Final Remarks. Teacher Evaluation: Why are we doing this again?

• Most important goal in my view is not only to avoid unfair decisions, and negative unintended consequences (though the potential for both should give us pause)

• Greatest risk is missing an opportunity to enact sound teacher evaluation policy with great potential to positively impact educational practice and outcomes

Page 50: Issues in Teacher Evaluation and Validity: Conceptual, Methodological, and Practical

50/27University of California, Los Angeles

Thank you

[email protected]