standard setting procedures

41
RESEARCH METHODOLOGY Procedures for Establishing Defensible Absolute Passing Scores on Performance Examinations in Health Professions Education Steven M. Downing Ara Tekian Rachel Yudkowsky Department of Medical Education University of Illinois at Chicago Chicago, Illinois, USA

Upload: rizwan-zafar-ansari

Post on 08-Nov-2014

52 views

Category:

Documents


3 download

DESCRIPTION

abels, angoff, hofstee, borderline, standard setting in medical education

TRANSCRIPT

Page 1: Standard Setting Procedures

RESEARCH METHODOLOGY

Procedures for Establishing Defensible Absolute Passing Scores on

Performance Examinations in Health Professions Education

Steven M. DowningAra Tekian

Rachel YudkowskyDepartment of Medical Education

University of Illinois at ChicagoChicago, Illinois, USA

Page 2: Standard Setting Procedures

Learning Objectives

• By the end of IP we should able to • Describe standard setting methods• Differentiate b/w their types

– Norm based – Criterion based – Relative – Absolute

• Know Passing Score• Describe selection of Judges (examiners)• Identify Borderline Examinee• Understand each method

Page 3: Standard Setting Procedures

What do experts say?

"We have come to realize that there is no objectively correct way to set standards. But we have also come to realize that there is nothing wrong with using judgments appropriately." (Zieky, 1995, p.5)

"Determination of a minimum acceptable performance always involves some rather arbitrary and not wholly satisfactory decisions." (Ebel, 1972, p.492)

Page 4: Standard Setting Procedures

Why we need standard setting methods?

– To determine the standards of performance– To separates the Non-competent from the

Competent– To provide an educational tool to decide CUT OFF

POINT on the score scale

• (Reference AMEE Guide No. 18: Standard setting in student assessment)

Page 5: Standard Setting Procedures

essentials

• 1. choice of content expert 5-6 or 11- 12

• 2. identification of borderline examine

• 3. cut score

Page 6: Standard Setting Procedures

Choice of content expert

Judges should be to• Judge examinee performance Unbiased• Follow the Instructions • Understand their taskJudges should be • subject experts• Belong from variable culture, ethics, religion and

both genders ( male and female judges)• 5 -12 judges panel is better.

Page 7: Standard Setting Procedures

Borderline examinee

• One who has 50 50 probability of passing or failing the test.

• Sometimes passes the exam and sometimes fails

Or • Judges will decide the characteristics of

borderline examinee

Page 8: Standard Setting Procedures

Cut Score

• There is no gold standard for pass scores. • Passing score is what ever the judges decide.• Different panel of judges may decide different

passing score for the same exam.• It depends upon how much is enough to pass,

the subject experts will decide by devising a check list / predetermined key or left it open on judges judgment.

Page 9: Standard Setting Procedures

Problem with judges

• Judges expect even more from borderline examinee

• They set unrealistically high standards which fails a reasonably high proportion of examinee e.g– in viva(Examiners expectations are so high ) It happens when judges decide cut off without

knowing the actual performance data.

Page 10: Standard Setting Procedures

How can we overcome this problem• Celebration of judges by

– Providing student record showing their overall performance.

– Student record

Test Topics MCQs SAQs OSPE VIVA

Jan – Feb Thanatology Fail Fail below Fail

Mar – April Autopsy and exhumation Pass Fail Pass Fail

May – June Asphyxia Pass Pass Pass Borderline

Aug – Sept PI and traumatology Above Above Excellent Pass

Send up Above pass Very good Good

Page 11: Standard Setting Procedures

Passing Rate

no of passed students x 100 total no of students appeared

64/100x100 = 64%

0Not done

1poor

2 3 Below expectation

4 borderline

5 6 Meet expectation

7 Above expectation

4 6 4 17 24 35 7 3

PASSED = 64

Page 12: Standard Setting Procedures

Standards setting Methods

Relative

Item based

Performance based

Page 13: Standard Setting Procedures

Relative

Item based

Modified Angoff Ebel

performance

Original Angoff

Page 14: Standard Setting Procedures

Angoff Method• Judgments of the judges are combined to

determine passing score• Rater• Item 1 2 3 4 5 6 7 M• 1 .80 .87 .85 .90 .80 .95 .85 0.86• 2 .70 .75 .80 .85 .75 .85 .75 0.78• 3 .50 .63 .55 .60 .65 .60 .60 0.59• 4 .70 .68 .70 .70 .65 .70 .70 0.69• 5 .75 .70 .80 .85 .70 .85 .80 0.78• 6 .60 .65 .80 .75 .65 .85 .80 0.73• 7 .50 .58 .55 .60 .70 .90 .60 0.63• 8 .70 .78 .75 .75 .65 .80 .70 0.73• 9 .45 .50 .50 .45 .43 .55 .45 0.48• 10 .60 .69 .65 .65 .65 .70 .70 0.66• Sum 6.93• Pass Score is 69.30%• Raw Passing Score = Sum of item means = 6.93. Percent Passing• Score = 100% × (sum of item means/number of items) = 100% ו (6.93/10) = 69.30%.

Angoff Passing Score

Page 15: Standard Setting Procedures

Angoff’s method - 2

– Read the first item– Estimate the proportion of the borderline group that

would respond correctly– Record ratings, discuss, and change – Repeat this for each item– Calculate the passing score by

• adding rating score of each item separately (modified angoffs) e.g FCPS examinee has to satisfy all judges

• Adding performance of all stations (original angoff) OSCE

Page 16: Standard Setting Procedures

Ebels Method

• Judges define the check list and rating scale• Categorize items like essential, important,

acceptable• Rate item like easy medium hard• judges define the borderline performance to

pass (0 – 100 %)

Page 17: Standard Setting Procedures

Ebel’s method

Easy Medium Hard

Essential

Important

Acceptable

Page 18: Standard Setting Procedures

Ebel’s Method

– Judges make judgments about the percentages of items in each category that borderline test-takers would have answered correctly

– Calculate passing score

Page 19: Standard Setting Procedures

Ebel’s method %(borderline perform correctly)

Easy Medium Hard

Essential 95% 60% 40%

Important 90% 56% 34%

Acceptable

80% 60% 50%

Page 20: Standard Setting Procedures

Items Relevance

Easy Medium Hard Weighted Mean

Item # % correct

Item # % correct

Item # % correct

Essential 4 , 5 93 1 81 3 63 2(.93)+.81+.63= 3.30

Important 2 89 10 76 9 59 .89+.76+.59= 2.24

acceptable N/A N/A 7 62 6 , 8 42 .62+2(.42)= 1.46

T. Mean 3.30 + 2.24 + 1.46 = 7

Passing rate = Total mean x 100 / no of items = 7 x 100 / 10 = 70%

% correct is the mean judgment of all the judges , borderline examinee did correct.

Page 21: Standard Setting Procedures

Absolute

Criterion based

Norm based

Page 22: Standard Setting Procedures

• Criterion referenced methods :

– Based on how much the examinees know– Candidates pass or fail depending on whether they meet

specified criteria – In Criterion-referenced tests (or CRTs) performance of each

examinee is compared to a pre-defined set of criteria or a standard. The goal with these tests is to determine whether or not the candidate has the demonstrated mastery of a certain skill or set of skills.

– E.g . A national board medical exam is an example of a CRT. Either the examinee has the skills to practice the profession, in which case he or she is licensed, or does not.

– e.g. examinees must correctly answer 70% of the questions

• REF : NORM-REFERENCED VS. CRITERION-REFERENCED TESTING• May 22nd, 2008 by Danielle, Director of Sales and Marketing, Language Testing

Page 23: Standard Setting Procedures

Criterion referenced standard

50 %

Test score distribution (average group)

Test score distribution (good group)

Test score distribution (poor group)

Page 24: Standard Setting Procedures

• criterion based– Based upon already set criteria – e.g– 33% passing score in FA,BA exams– 50% passing score in MBBS exams – 60% passing score at post graduation level– 80% passing score in skilled exams.

Page 25: Standard Setting Procedures

Criterion based

borderline contrast

Page 26: Standard Setting Procedures

Contrasting Groups

• Performance is judged by check list or rating scale.

• Students are divided into expert and non-expert groups based on rating scale

• Graphical presentation .• Passing score is set at the insertion of two

distributions false positive and false negative.

Page 27: Standard Setting Procedures

Compromise Methods

• Advantages– Easy to implement– Educators are comfortable with the decisions

• Disadvantages– The cut score may not be in the area defined by

the judges’ estimates– The method is not the first choice in a high stakes

testing situation

Page 28: Standard Setting Procedures

Borderline Group• Examinee centered.• Performance of the examine is judge overall.• Faculty directly observe the performance• E.g OSCE• Each judge observe multiple examinee on same

station.• Judges use global rating scale • 1= fail, 2= borderline, 3= pass• The mean checklist score of borderline

examinee becomes the passing score.

Page 29: Standard Setting Procedures

Types of Standards

• Norm referenced methods (NTR ) :– Based on a comparison among the performances of examinees

Or– Compare examinee performance to that of other examinees. – Standardized examinations such as the SAT are norm-referenced

tests. The goal is to rank the set of examinees so that decisions about their opportunity for success (e.g. college entrance) can be made.

– e.g Normal distribution bells curve. A set proportion of candidates fails regardless of how well they perform e.g. the top 84% pass

Page 31: Standard Setting Procedures

Norm-referenced standard

Test score distribution

30 %

50 % 80 %

Page 32: Standard Setting Procedures

Hofstee Method(relative – absolute compromise method)

• Judges are ask to define minimum and maximum passing score and failure rate .e.g

• 81 -100 % outstanding• 71 – 80 % above expectation• 61 – 70 % (max pass score) meet expectation• 56 – 60 % top borderline• 51 – 55 % (min pass score) bottom borderline• 40 – 50 % below expectation• 20 – 39 % perform incorrect• 0 - 19 % don’t know

Page 33: Standard Setting Procedures

Hofstee Method

• Graphical presentation• Judges predefined

– Fail rate e.g min 6, max 20 students to fail.– Acceptable

• Lowest pass score %• Highest pass score %

Min/max pass score

J – 1 J - 2 J - 3 J - 4 J - 5 J - 6 mean

Min 62 57 51 55 52 59 56

max 72 67 73 65 60 71 68

Page 34: Standard Setting Procedures

Hofstee Graph

Actual score

Min Max pass %

56 % 68%

61%

Max fail rate 20 %

Min fail rate 06 %

Cumulative %

Scores

Page 35: Standard Setting Procedures

Compensatory Vs Non compensatory

Compensatory• Poor performance on one

station can be compensated by good performance on other stations.

• Overall score will be the avg of performance on all the station.

• E.g SAQs, MMI, OSCE

Non compensatory• Student should reach the

minimum level of competence on each station.

• Student has to meet a predefined criteria on each station to pass.

• E.g OSATS, DOPS, Mini CEX

Page 36: Standard Setting Procedures

comparisonJudgment focused on

Judgment require performance data

Direct observation

Timing of judgememt

Angoff Test items / Performance

No No Before exam

Ebel Test items Yes No After exam

Hofstee Whole test Yes No After exam

Border line Examinee performance

No Yes During exam

Contrast Examinee performance

No Yes During exam

Page 37: Standard Setting Procedures

Summary :-

1. All standard-setting is judgmental

2. Standard-setting leads to errors of classification

3. Standard-setting is and will remain controversial

4. There is no purely absolute standard.

5. There is no one right method

6. Choosing judges is more important than

choosing methods

7.

Page 38: Standard Setting Procedures

Summery …….

• If the expert use rating scale or check list for assessment then you can choose borderline or contrasting method.

• If you don’t have expert rating the exam then you can choose Angoff, eble or Hofstee method

Page 39: Standard Setting Procedures

Critique

• This article describes only the standard settings for performance based exam ie OSCE, OSATS, DOPS

• Classification of standards is some what confusing.• Standards are overlapping no clear demarcation• These methods can be applied with some

modifications.• Dose not discuss percentile method.

Page 40: Standard Setting Procedures

References• AMEE guide No. 18• Berk, R.A. (1986). A consumer's guide to setting performance standards on

criterion-referenced tests. Review of Educational Research, 56, 137-172.

• Cizek, G. J. (2001). Setting Performance Standards: Concepts, Methods, and Perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.

• Jaeger, R.M. (1989). Certification of student competence. In R.L. Linn (Ed.), Educational Measurement. New York: American Council on Education and Macmillan Publishing Company.

• Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64, 425-461.

• Livingston, S.A. and Zeiky, M.J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

Page 41: Standard Setting Procedures

References• Norcini, J.J. and Guille, R.A. (2002). Combining tests and setting standards. In

Norman, G., van der Vleuten, C., and Newble, D. (Eds.): International Handbook of Research in Medical Education (pp. 811-834). Dordrecht: Kluwer Press.

• Norcini, J. J. (2003). Setting standards on educational tests. Medical Education, 37, 464-469.

• Norcini, J. J. & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10, 39-59.

• Zeiky, M. J. (2001). So much has changed. How the setting of cutscores has evolved since the 1980s. In G.J.Cizek (Ed.), Setting Performance Standards: Concepts, Methods, and Perspectives (pp. 19-52). Mahwah, NJ: Lawrence Erlbaum Associates.