assessment and performance standards how good is good enough? march 4-6, 2008

33
Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

Upload: audrey-wells

Post on 14-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

Assessment and Performance StandardsHow Good is Good Enough?

March 4-6, 2008

Page 2: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 2

William Lorié, Ph.D.

Director, International R&D

CTB/McGraw-Hill

Page 3: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 3

Agenda

• Chapter I: So you want to be a decathlete

• Chapter II: You want me to jump how high?

• Chapter III: Favorable winds, sun in my face: A detour into human performance

• Chapter IV: Philosophies for setting the bar

Page 4: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 4

Chapter I

So you want to be a decathlete

Page 5: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 5

Page 6: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 6

How Good Is Good Enough?

Page 7: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 7

2004 UK level “A”

2000 UK level “A”

Page 8: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 8

The Standard is Different for the Generalist

Needed 8000 for “A” Level Qualification for Decathlon in UK Olympic Team in 2004

At 800 points per event, I can “get by” with a high jump of 2 meters

…Or less, if I am relatively strong in other events…

Page 9: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 9

Chapter II

You want me to jump how high?

Page 10: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 10

Educational Tests Are More Like Decathlons Than High Jumps

• Student learning outcomes are varied and interlinked

• At every level of schooling, especially early on, we want students to do well in a number of broad learning outcomes, not just a few

• Students can be strong overall, weak overall, or strong in some areas and weak in others

Page 11: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 11

Content and Performance Standards

Who’s being Tested?

High Jumper Decathlete

What’s on the Test?

High Jump Event Ten Different but Related Events

What do they need to Pass?

Jump 2.3 meters Get 8000 points

(Try to high jump at least 2 meters)

Page 12: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 12

So, How Good is Good Enough?

• A matter of judgment

• Takes into consideration that The test is a sample of tasks that all count toward the

final score It is not essential to master any one given task Tasks are a sample from a broader domain that we care

about – not everything that could have been tested, is tested. Traiacontacaioctacathlon anyone?

Page 13: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 13

Chapter III

Favorable winds, sun in my face:

A detour into human performance

Page 14: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 14

Olympic High Jump Athlete’s Performance

Typical (Average)

Recent Worst

Personal Best

World Record

Page 15: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 15

Olympic High Jump Athlete’s Performance

Typical (Average)

Recent Worst

Personal Best

World Record

Page 16: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 16

All Individual Human Performance has variation…

Possible sources of variation

High Jumper or Decathlete

Student Taking a High School Exit Exam

Systematic -Weather conditions

-Altitude

-Indoors or out

-Gender

-Time of year

-Curriculum

-Quality of Instruction

Non-systematic

-Sharpened focus

-Loss of concentration

-Muscle fatigue / failure

-Lapses in judgment

-Moments of insight

-Mood

Page 17: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 17

…and Error Is a Part of All Measurement

After you’ve standardized your field conditions, and controlled everything you can think of, you still get variation in individual performance.

In measurement, that variation is due entirely to non-systematic sources.

Those sources are all lumped together and called Error. Error is a technical concept.

Page 18: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 18

Where Is There Error in Educational Measurement?

• The average score of French 8th graders on TIMSS

• My college entrance exam scores

• Diane Lotfi’s 5th grade standardized achievement test scores

• The grades I gave my 9th year students in physical science when I was a teacher

• Student grade-point averages

• Throughout your entire recorded academic career

Page 19: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 19

Don’t Panic

• In the long run, the Errors average out to zero

• When it matters most, rigorous steps are taken to quantify and minimize Error

Page 20: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 20

The Problem of Error and Performance Standards

Decathlon Scores in Preliminary Round Tryouts

0

1

2

3

4

5

6

7600

7620

7640

7660

7680

7700

7720

7740

7760

7780

7800

7820

7840

7860

7880

7900

7920

7940

7960

7980

8000

8020

8040

8060

8080

8100

Nu

mb

er o

f A

thle

tes

Page 21: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 21

Coach, can you give me another chance?

Decathlon Scores in Preliminary Round Tryouts

0

1

2

3

4

5

6

7600

7620

7640

7660

7680

7700

7720

7740

7760

7780

7800

7820

7840

7860

7880

7900

7920

7940

7960

7980

8000

8020

8040

8060

8080

8100

Nu

mb

er o

f A

thle

tes

Page 22: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 22

Chapter IV

Philosophies for Setting the Bar

Page 23: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 23

How Do We Set the Bar?

Two Ways: Think of People or Think of Tasks

Page 24: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 24

The People Approach, Roughly…

• I know my students well. I can make judgments about whether each has met the bar. “Have minimal competency in 4th grade mathematics” “Merit a high school leaving certificate” “Are prepared for the next unit of instruction in Arabic”

• A standardized test is given, and the score that discriminates most highly between the two groups is chosen as the standard.

Page 25: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 25

In Practical Terms, Most Standard Setting

(or Level Setting) Follows the Task Approach

Page 26: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 26

Some Select Task Approaches

• Angoff and modifications

• Ebel

• Nedelsky

• Jaeger-Mills

• Bookmark

• Body of Work

• Briefing Book

• Item-Descriptor Matching

Page 27: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 27

What They Have in Common

• Consider items, tasks, or more specifically performances on tasks

• Rely on concept / abstraction of the minimally qualified individual

• Most have been generally accepted in the field Angoff is first invented and most widely used Bookmark is most popular in achievement testing

• All have been praised and criticized

Page 28: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 28

Standard Setting is arguably the most controversial and most consequential of all

the areas of educational measurement

Why?

• Variation in results due to method, judges, language of performance standard

• The cut point sometimes has important consequences for students, teachers, schools, entire systems, reform efforts.

That 8000 Can Alter Your Life Plan

Page 29: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

Thank you.

Questions?

Page 30: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 30

Group Activity: Modified Angoff Standard Setting

• You have been convened by the Ministry of a GCC country to establish standards for “Proficiency” in 5th grade mathematics.

Step 1: Discuss the minimally proficient student His / her knowledge, skills, and abilities

Step 2: Review / take a test of 20 mathematics items at the 5th grade level

Step 3: We will give you verbal instructions on how to make Angoff judgments on the items

Step 4: You will make one round of judgments and we will provide feedback for you

Page 31: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 31

Instructions for standard setting judges

• What is the probability that a minimally Proficient grade 5 student will get this item correct?

• In a group of 100 minimally Proficient grade 5 students, what percent would you expect to get this item correct?

(Convince yourselves that these are equivalent statements.)

Page 32: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 32

Types of Feedback

• Table Mean and Dispersion

• Group Mean and Dispersion

• Impact

Page 33: Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008

© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 33

What would happen in the real thing?

• Multiple Rounds

• Calculation of Level Setting Error

• Review by Sponsoring Agency

• Final Decision

• Implementation

• Possible Future Review