pamela k. kaliski, stefanie a. wind, george engelhard jr., deanna l.morgan,

22
Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: IllustrationWith the Advanced Placement Environmental Science Exam Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan, Barbara S. Plake, and Rosemary. Reshetar psychology and applied statistics ://ncaase.com/about/bio?id=4 http://des.emory.edu/home/ people/faculty/Engelhard.html

Upload: mandell

Post on 14-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam. http://des.emory.edu/home/people/faculty/Engelhard.html. Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Using the Many-FacetedRasch Model to Evaluate

Standard Setting Judgments:An IllustrationWith the Advanced Placement

Environmental Science Exam

Pamela K. Kaliski, Stefanie A. Wind,

George Engelhard Jr., Deanna L.Morgan,

Barbara S. Plake, and Rosemary. Reshetar

psychology and applied statistics

http://ncaase.com/about/bio?id=4

http://des.emory.edu/home/people/faculty/Engelhard.html

Page 2: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Contents

2. Multiple Yes-No (MYN) method

1. Many-Faceted Rasch Model

4. Results and Conclusion

3. Instrument

22

Page 3: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

introduction

Standard setting ‘‘. . . standard setting refers to the process of establishing one or

more cut scores on examinations. The cut scores divide the distribution of examinees’ test performances in two or more categories’’ Cizek and Bunch (2007)

criteria The criteria for evaluating panelist judgments Procedural validity : implementation issues and

documentation Internal validity : interpanelist and intrapanelist consistency External validity : comparisons with other methods

33

Page 4: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Many-Faceted Rasch Model

n: panelist k: a standard setting modified Angoff rating i :item ; j : round n is the judged severity for panelist n, i is the average judged item difficulty for item i, j is the judged average performance level for round j jk is the cut score, or threshold coefficient, from round j for

standard setting ratings of k. ( rating k relative to k − 1)

44

Page 5: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Rating quality indices

Rating quality indices(a) panelist severity/leniency measures:

separation statistics/chi-square statistic(b) model–data fit : Outfit MSE

(c) the creation of a visual display for comparing panelist judgments on the latent variable

55

Page 6: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Multiple Yes-No (MYN) method

MYN requires panelists to consider the borderline examinee at each cut score and to identify at which level the borderline examinee would be able to answer each item correctly.

panelists considered each item and decided whether or not the borderline examinee in

each category would be able to identify the correct answer

66

Page 7: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

PLDs

77

Page 8: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Would a borderline-1/2 student be able to answer this item correctly? If yes, then the panelist would circle the 1/2 cut score on the rating form and move on to the next item.

If no, then the panelists would consider the next question about the same item: Would a borderline-2/3 student be able to answer this item correctly?

If yes, the 2/3 cut score would be circled for that item and the panelist would move on to the next item. If no, the panelist would consider the next question about the same item: Would a borderline-3/4 student be able to answer this item correctly?

If yes, the 3/4 cut score would be circled for that item and the panelist would move on to the next item.

If no, then the panelists would consider the next question about the same item: Would a borderline-4/5 student be able to answer this item correctly? If yes, the 4/5 cut score would be circled for that item and the panelist would move on to the next item. If no, then the panelist would consider the final question about the same item: Would the above borderline-5 student be able to answer this item correctly?

If yes (which is likely given that all other possible borderline students have

been considered), then the Above 5 score would be circled for that item.

88

Page 9: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

99

Page 10: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

instuments

The Advanced Placement (AP) program (Advanced Placement Environmental Science (APES) examination) is composed of 34 courses and corresponding examinations in 22 subject areas.

Data used in this study come from the 2011 administration of the APES exam and the standard setting for this examination.

100 MC items and four CR itemsData used in this study are the ratings that resulted

from two rounds of item-level judgments provided by the 15 APES panelists

1010

Page 11: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Research Purpose

the MFR model is used to evaluate the quality of judgments on MC items provided by panelists who participated in a modified Angoff standard setting that used the MYN method for MC items, the 2011 APES exam.

panelist characteristics (gender and level of teaching) are incorporated into the MFR model to determine whether or not these are explanatory variables that account for differences in panelist ratings

1111

Page 12: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Results and conclusion

1212

Page 13: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Results and conclusion P397

1313

Page 14: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

P398

1414

MSE[0.6-1.5]

Page 15: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

P400

1515

Page 16: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

P401

1616

Page 17: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

1717

P402

Page 18: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

1818

P404

Page 19: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

P405

1919

Page 20: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

P405

2020

Page 21: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Future study

additional explanatory variablesadditional statistical modelsMFR model+other modified Angoff procedures,

or Bookmark proceduresoverall contribution of each facetCR questions

2121

Page 22: Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Thus the interaction between theta and omega should be considered in Equation 1

different rating scale structure and a random effect approach

Through the PC power, let panelists use computer to do standard setting. We can record the time spent

transform cut score of each category to the expected scores

2222