teacher evaluation and goal setting connecticut

John Cronin, Ph.D.Director

The Kingsbury Center @ NWEA

Implementing the Connecticut teacher evaluation system

Presenter - John Cronin, Ph.D.

Contacting us:Rebecca Moore: 503-548-5129E-mail: rebecca.moore@nwea.org

Helping teacher’s set reasonable and rigorous goals

What you’ll learn

• The purposes of teacher evaluation. There can be different purposes for different educators.

• The value of differentiating educator evaluations and the risks associated with not differentiating.

• The value of multiple measurements, and the importance of defining a purpose for each measurement.

• The information needed to determine whether a goal is attainable• The difference between “aspirational” and “evaluative” goals and the

value of each.• Thoughts on strategies for addressing the difficult to measure.

What’s the purpose?

• The Connecticut system requires a collaborative process.• For most educators the purpose of evaluation is formative.• For a small minority of educators the purpose is to summative

and goals may involve demonstrating basic competence.• Leaders should be transparent about the purpose of the

process for each educator.• Perfect consistency isn’t necessarily a requirement.

Differences between principal and teacher evaluation

Principals

Thus principals should be evaluated on their ability to improve growth or maintain high levels of growth over time rather than their students’ growth within a school year.

• Inherit a pre-existing staff.• Have limited control over

staffing conditions• Work with this intact

group from year to year.

Differences between principal and teacher evaluation

Teachers • They have new groups of

students each year• They generally work with

those students for one school year only.

• New teachers generally become more effective in their first three years

The difference between formative and summative evaluation

• Formative evaluation is intended to give educators useful feedback to help them improve their job performance. For most educators formative evaluation should be the focus of the process.

• Summative evaluation is a judgment of educator performance that informs future decisions about employment including granting of tenure, performance pay, protection from layoff.

Purposes of summative evaluation

• An accurate and defensible judgment of an educator’s job performance.

• Ratings of performance that provide meaningful differentiation across educators.

• Goals of evaluation– Support professional improvement– Retain your top educators– Dismiss ineffective educators

If evaluators do not differentiate their ratings, then all differentiation comes from the test.

If performance ratings aren’t consistent with school growth, that will probably be public information.

Results of Tennessee Teacher Evaluation Pilot

1 2 3 4 50%

Value-added resultObservation Result

Results of Georgia Teacher Evaluation Pilot

Evaluator Rating

ineffectiveMinimally EffectiveEffectiveHighly Effective

Connecticut expectations around teacher observation

• First and Second year teachers – – Required – 3 formal observations– Recommended – 3 formal observations and 3 informal

observations• Below standard and developing –

– Required – 3 formal evaluations– Recommended – 3 formal evaluations and 5 informal

evaluations• Proficient and exemplary –

– Recommended – 3 formal evaluations – 1 in class– Required – 3 formal evaluations – 1 in class

Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable Measures of Effective Teaching: Culminating Findings from the MET Projects Three-Year Study

Observation by Reliability coefficient (relative to state test value-added gain)

Proportion of test variance explained

Model 1 – State test – 81% Student surveys 17% Classroom Observations – 2%

.51 26.0%

Model 2 – State test – 50% Student Surveys – 25% Classroom Observation – 25%

.66 43.5%

Model 3 – State test – 33% - Student Surveys – 33% Classroom Observations – 33%

.76 57.7%%

Model 4 – Classroom Observation 50%State test – 25%Student surveys – 25%

.75 56.2%

Reliability of evaluation weights in predicted stability of student growth gains year to year

Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable Measures of Effective Teaching: Culminating Findings from the MET Projects Three-Year Study

Observation by Reliability coefficient (relative to state test value-added gain)

Proportion of test variance explained

Principal – 1 .51 26.0%

Principal – 2 .58 33.6%

Principal and other administrator .67 44.9%

Principal and three short observations by peer observers .67 44.9%

Two principal observations and two peer observations .66 43.6%

Two principal observations and two different peer observers .69 47.6%

Two principal observations one peer observation and three short observations by peers

.72 51.8%

Reliability of a variety of teacher observation implementations

Why should we care about goal setting in education?

Because we want studentsto learn more!

• Research view–Setting goals improves performance

Testing

Metric (Growth Score)

Analysis (Value-Added)

Evaluation (Rating)

The testing to teacher evaluation process

The difference between growth and improvement

Mathematics

No ChangeDownUp

Fall RIT

One district’s change in 5th grade math performance relative to Kentucky cut scores

MathematicsFailed growth targetMet growth target

Student’s score in fall

Number of students who achieved the normal mathematics growth in that district

Issues in the use of growth measures

Measurement design of the instrument

Many assessments are not designed to measure growth. Others do not measure growth equally well for all students.

Tests are not equally accurate for all students

California STAR NWEA MAP

Issues with rubric based instruments

• Rubrics should be granular enough to show growth. Four point rubrics may not be adequate.

• High and low scores should be written in a manner that sets a reasonable floor and ceiling.

Expect consistent inconsistency!

Inconsistency occurs because

• Of differences in test design. • Differences in testing conditions. • Differences in models being applied to

evaluate growth.

Test Retest

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The reliability problem – Inconsistency in testing conditions

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

Test 1 Time 1

Test 2 Time 1

Test 1 Time 2

Test 2 Time 2

The problem with spring-spring testing

3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12

Teacher 1 Summer Teacher 2

Testing

Metric (Growth Score)

Analysis (Value-Added)

Evaluation (Rating)

The testing to teacher evaluation process

Issues in the use of growth and value-added measures

Differences among value-added models

Los Angeles Times Study

Los Angeles Times Study #2

Issues in the use of value-added measures

Control for statistical error

All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.

Issues in the use of growth and value-added measures

Control for statistical error

New York City

New York City #2

-12.00-11.00-10.00

-9.00-8.00-7.00-6.00-5.00-4.00-3.00-2.00-1.000.001.002.003.004.005.006.007.008.009.00

10.0011.0012.00

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed stu-dents who had tests of questionable validity and teachers with fewer than 20 students.

Range of teacher value-added estimates

What’s a SMART goal?

• Specific• Measurable• Attainable• Relevant• Time-Bound

Specific

• What: What do I want to accomplish?• Why: What are the reasons or purpose for

pursuing the goal?• Which: What are the requirements and

constraints for achieving the goal?

SMART goal resources

• National Staff Development Council. Provides a nice process for developing SMART goals.

• Arlington Public Schools. Excellent and detailed examples of SMART goals across subject disciplines, including art and music.

• The Handbook for Smart School Teams. Anne Conzemius and Jan O’Neill.

Issues with local tests and goal setting

• Validity and reliability of assessments.• Teachers/administrators are unlikely to set

goals that are inconsistent with their current performance.

• It is difficult to set goals without prior evidence or context.

The goal should ALWAYS be improvement in a domain (subject)!

Specific

There should ALWAYS be multiple data sources and metrics.

Specific

Data should be triangulated

• Classroom assessment data to standardized test data.

• Domain data (mathematics) to sub-domain data (fractions and decimals) to granular data (division with fractions).

All students should be “in play” relative to the goal.

Specific

Measurable

Types of Goals

• Performance – 75% of the students in my 7th grade mathematics class will achieve the qualifying score needed for placement in 8th grade Algebra.

• Growth – 65% of my students will show growth on the OAKS mathematics test that is greater than the state reported norm.

• Improvement – Last year 40% of my students showed growth on the OAKS mathematics test that was greater than the norm. This year 50% of my students will show greater than normal growth.

Attainable

The goals set should be reasonable and rigorous. At minimum, they should represent a level of performance that a competent educator could be expected to achieve.

An analogy to baseball

Center Fielders - WAR (Wins Above Replacement)

Superstar – Mike Trout – Los Angeles Angels

Median Major Leaguer – Gregor Blanco, San Francisco Giants

Marginal Major Leaguer – Chris Young, Oakland As

The difference between aspirational and evaluative goals

Aspirational – I will meet my target weight by losing 50 pounds during the next year and sustain that weight for one year.

Proficient – I intend to lose 15 pounds in the next six months, which will move me from the “obese” to the “overweight” category, and sustain that weight for one year.

Marginal – I will lose weight in the next six months.

Ways to evaluate the attainability of a goal

• Prior performance• Performance of peers within the system• Performance of a norming group

One approach to evaluating the attainment of goals.

Students in La Brea Elementary School show mathematics growth equivalent to only 2/3 of the average for students in their grade.

Level 4 – (Aspirational) – Students in La Brea Elementary School will improve their mathematics growth equivalent to 1.5 times the average for their grade.

Level 3 – (Proficient) Students in La Brea Elementary School will improve their mathematics growth to be equivalent to the average for their grade.

Level 2 – (Marginal) Students in La Brea Elementary School will improve their mathematics growth relative to last year.

Level 1 – (Unacceptable) Students in La Brea Elementary School do not improve their mathematics growth relative to last year.

Is this goal attainable?

62% of students at John Glenn Elementary met or exceeded proficiency in Reading/Literature last year. Their goal is to improve their rate to 82% this year. Is the goal attainable?

Growth > -30%

> -20% > -10% > 0% > 10% > 20% > 30%0

50100150200250300350400

362 351 291173

73 14 3

Oregon schools – change in Reading/Literature proficiency 2009-10 to 2010-

11 among schools that started with 60% proficiency rates

Is this goal attainable and rigorous?

45% of the students at La Brea elementary showed average growth or better last year. Their goal is to improve that rate to 50% this year. Is their goal reasonable?

LaBrea District Average0%

10%20%30%40%50%60%70%80%90%

Students with average or better annual growth in Repus school district

The selection of metrics matters

Students at LaBrea Elementary School will show growth equivalent to 150% of grade level.

Students at Etsaw Middle School will show growth equivalent to 150% of grade level.

Scale score growth relative to NWEA’s growth norm in mathematics

2 3 4 5 6 7 8 9-1.0

Growth Index

Percent of a year’s growth in mathematics

2 3 4 5 6 7 8 90%

20%40%60%80%

100%120%140%160%180%200%

Mathematics

ear’

Assessing the difficult to measure

• Encourage use of performance assessment and rubrics.• Encourage outside scoring

– Use of peers in other buildings, professionals in the field, contest judges

• Make use of resources– Music educator, art educator, vocational professional

associations– Available models – AP art portfolio. – Use your intermediate agency– Work across buildings

• Make use of classroom observation.

Success can’t be replicated if you don’t know why you succeeded.

Failure can’t be reversed if you don’t know why you failed.

The outcome is important, but why the outcome occurred is equally important!

• Establish checkpoints and check-in beyond what’s required when possible.

• Collect implementation information – Classroom observations - visits– Teacher journal– Student work – artifacts

These processes can be done by teacher peers!

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org

Thank you for attending

teacher evaluation and goal setting connecticut

purpose of evaluation

principal observations

educators formative

purposes of teacher

peer observations

classroom observations

informal observations

short observations

Education

goal setting -...

effective goal setting - why you should use goal setting

goal setting. why is goal setting important? why is goal...

goal setting. session outline defining goals and types of...

goal setting

goal setting and goal striving