deconstructing paper-based test scoring and test test sheets results received ... and the...

12
Deconstructing Paper-based Test Scoring and Test Analysis using the University of Maryland Baltimore Test Scoring Service - Shannon Tucker, Director of Instructional Technology The University of Maryland Center for Information Technology Services provides test scoring services to the campus for multiple choice exams using Remark Office OMR a form-processing software for surveys and tests. This software recognizes optical marks (bubbles and checkboxes) and barcodes 1 . To ensure assessments scored using this software are processed efficiently, the School of Pharmacy has standardize test sheets to include the UMB One Card ID number and Program/Campus location as identifying information (figure 1). Figure 1: School of Pharmacy Test Sheet Detail The recent addition of the UMB One Card ID number as the Student ID number in the Blackboard gradebook has provided faculty and teaching assistants with an automatic reference for this information when reviewing results received from the campus test scoring service for any student registered in a course. Additionally, all faculty are provided with a Microsoft Excel formatted master report for PharmD students and graduate students (if applicable) in the Pharmacy Portal as an additional reference for this information. 1 Remark Office OMR Software . 2007. Principa Products. 19 September 2007 <http://www.gravic.com/remark/officeomr/>

Upload: ngonhan

Post on 15-Mar-2018

221 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

Deconstructing Paper-based Test Scoring and Test Analysis using the University of Maryland Baltimore Test Scoring Service - Shannon Tucker, Director of Instructional Technology

The University of Maryland Center for Information Technology Services provides test scoring services to the campus for multiple choice exams using Remark Office OMR a form-processing software for surveys and tests. This software recognizes optical marks (bubbles and checkboxes) and barcodes1. To ensure assessments scored using this software are processed efficiently, the School of Pharmacy has standardize test sheets to include the UMB One Card ID number and Program/Campus location as identifying information (figure 1).

Figure 1: School of Pharmacy Test Sheet Detail

The recent addition of the UMB One Card ID number as the Student ID number in the Blackboard gradebook has provided faculty and teaching assistants with an automatic reference for this information when reviewing results received from the campus test scoring service for any student registered in a course. Additionally, all faculty are provided with a Microsoft Excel formatted master report for PharmD students and graduate students (if applicable) in the Pharmacy Portal as an additional reference for this information.

1 Remark Office OMR Software. 2007. Principa Products. 19 September 2007 <http://www.gravic.com/remark/officeomr/>

Page 2: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 2

Submitting Test Sheets for Scoring All individuals submitting test sheets for scoring are asked to provide the following information:

• School of Pharmacy Test Submittal Coversheet

• Examination Key

• Test Sheets

Results Received from the Test Scoring Service Instructors or teaching assistants submitting assessments to be scored can expect to receive the following information as a result of scoring:

• Item and Test Statistics (printout and electronic files)

• A comma separated spreadsheet containing all the data scanned into the Remark System that that can be imported into Excel

o This contains ID numbers, campus locations, and the options selected by the students • A tab delimited text file containing ID information and the Raw score that can be imported into Excel

and used in conjunction with Blackboard • A tab delimited text file containing ID information and individual item scores and a total percentage

score that can be imported into Excel Results returned will provide faculty/instructors with an opportunity to review test statistics, item statistics, and import grades into Excel for manipulation or use with Blackboard.

Working with Remark Generated Statistics Individuals using other test scoring products or statistical packages may notice some familiar and unfamiliar test and item statistics on returned results.

Test Statistics2 Statistic Description Number of tests Graded

Total number of tests that were graded.

Number of Graded Items

The number of items on the test that were graded.

Total Points Possible The total number of points on the test. Maximum Score The highest score from the graded tests. Minimum Score The lowest score from the graded tests. Median Score The median of the scores from the graded tests. Range of Scores The range is the distance between the highest and lowest score. Percentile 25 and 75 Percentiles are values that divide a sample of data into one hundred groups

containing (as far as possible) equal numbers of observations. For example, 25% of the data values lie below the 25th percentile.

2 Remark Office OMR User’s Guide. Pennsylvania: Malvern, 2004.

Page 3: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 3

Inter Quartile Range The difference between the 75th percentile and the 25th percentile. Mean Score The average score of all the graded tests. Variance The amount that each score deviates from the mean squared (by multiplying it

by itself). Standard Deviation A statistic used to characterize the dispersion among the measures in a given

population. It is calculated by taking the square root of the variance. Confidence Interval (1, 5, 95, and 99%)

A confidence interval gives and estimated range of values that is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. If independent samples are taken repeatedly from the percentage (confidence level) of the intervals will include the unknown population parameter. Remark Office OMR calculates Confidence Intervals of 1%, 5%, 95%, and 99%.

Kuder-Richardson Formula 20 (KR-20)

An overall measure of internal consistency.

Coefficient Alpha A coefficient that describes how well a group of items focuses on a single idea or construct.

Selected Item Statistics 3 Statistic Description Label The output label designated in the template. Value The corresponding numeric value for each output label. Weight The points assigned to the correct, incorrect, and missing responses. The weight

statistic applies to grading only. Frequency The number of times a particular label was chosen (appears in the dataset). Percent The corresponding percentage of the frequency. Cumulative Percent The sum of the percents from the first response up to and including the current

response. Valid Percent The percent not including missing items. Cumulative Valid Percent

The sum of the valid percents from the first response up to and including the current response.

P-Value A measurement of the difficulty of an item. Point Biserial A measurement of the discrimination of an item. It indicates the relationship

between a response for a given item and the overall test score of the respondent. A high value indicates that students scoring well on the test chose this response. The point biserial statistic applies to grading only.

3 Remark Office OMR User’s Guide. Pennsylvania: Malvern, 2004.

Page 4: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 4

Test Reliability There are numerous indexes that may be used to assess the internal consistency of an assessment. Currently, the most widely used measure of reliability is Cronbach’s Alpha (also known as the Coefficient Alpha)4. However, you will notice that test statistics included with every scored assessment will include both the Kuder-Richardson Formula (KR-20) and the Coefficient Alpha (Cronbach’s Alpha). The Coefficient Alpha is most often used on instruments where items are not scored as right or wrong5. KR-20 is a special case of Cronbach’s alpha specifically for ordinal dichotomies to evaluate how consistent student responses are among questions on an assessment6. In laymen’s terms KR-20 best measures how well your exam measures a subject (a single cognitive factor) and the Coefficient Alpha best measures surveys or attitude data.

Interpreting KR-207 KR-20 formula includes:

1. Number of test items on the exam 2. Student performance on every test item 3. Variance

Index Range: 0.00-1.00

Values near 0.00: Measuring many unknown factors, but not what you intended to measure

Values near 1.00: Close to measuring a single factor

Summary: An exam with a high KR-20 yields reliable student scores (consistent/true score)

How others use KR-20: Tulane University Office of Medical Education recommends a KR-20 score of 0.60 or larger to be acceptable.

Item Analysis Conducting an item analysis following an administration of your assessment is important to identify any questions that are not performing well due to inappropriate difficulty, scoring error, or other factors. When conducting an item analysis the item difficulty, item discrimination, and distractor quality should all be considered.

4 Streiner, David L. “Starting at the Beginning: An Introduction to Coefficient Alpha and Internal Consistency.” Journal of Personality Assessment 80(1) (2003): 99-103. 5 Reliability. Del Siegle Faculty Web Site University of Connecticut. < http://www.gifted.uconn.edu/siegle/research/Instrument%20Reliability%20and%20Validity/Reliability.htm> 19 September 2007. 6 Scales and Standard Measures. North Carolina State University. <http://www2.chass.ncsu.edu/garson/PA765/standard.htm> 19 September 2007. 7 Test and Item Analysis. Tulane University Office of Medical Education. < http://www.som.tulane.edu/ome/helpful_hints/test_analysis.pdf> 19 September 2007.

Page 5: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 5

Item Difficulty (p-value) Item Difficulty is a measure of the proportion of students/subjects who have answered an item correctly and is most commonly referred to as the p-value.

Index Range: 0.00-1.00

Values near 0.00: A greater proportion of students/subjects responded to the item correctly (more difficult)

Values near 1.00: A greater proportion of students/subjects responded to the item correctly (easier)

Summary: The p-value will report item difficulty related to your assessed population.

How others use the p-value: Consulting company Professional Testing suggests item difficulty for criterion-referenced tests (CRTs), with their emphasis on mastery-testing, many items on an exam form will have p-values of .9 or above. Norm-referenced tests (NRTs), are designed to be harder overall and to spread out the examinees' scores. Thus, many of the items on an NRT will have difficulty indexes between 0.4 and 0.6.8

Item Discrimination There are several indexes that successfully compute item discrimination. While the discrimination index is a popular and valid measure of item quality9, this index is not included in as a part of the Remark reported item statistics. Instead Remark provides the Point Biserial Correlation.

Point Biserial Correlation The Point Biserial Correlation quantifies the relationship between a student/subject’s score (correct or incorrect) and the overall assessment score.

Index Range: -1.00 - +1.00

Values Near -1.00: High scorers answered the item incorrectly more frequently than low scorers.

Values Near +1.00: High scorers answered the item correctly more frequently than low scorers.

Summary: A negative value indicates an item may have been misleading, keyed incorrectly, or the content was inadequately covered.

How others use the point biserial correlation: Tulane University Office of Medical Education suggest to fauclty that a score of +0.20 is desirable10.

They also suggest that there is an interaction between the item discrimination and item difficulty that should be considered by faculty:

8 Step 9. Conduct the Item Analysis. Building High Quality Examination Programs – Professional Testing. 2005. <http://www.proftesting.com/test_topics/steps_9.shtml> 19 September 2007. 9 Pyrczak, Fred. “Validity of the Discrimination Index as a Measure of Item Quality.” Journal of Educational Measurement. 10(3) (1973):227-231. 10 Test and Item Analysis. Tulane University Office of Medical Education. < http://www.som.tulane.edu/ome/helpful_hints/test_analysis.pdf> 19 September 2007.

Page 6: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 6

• Very easy or very difficult test items have little discrimination

• Items of moderate difficulty (60% - 80% answering correctly) generally are more discriminating.

Sample results from Tulane University Office of Medical Education using the p-value with the point biserial correlation can be found at:

http://www.som.tulane.edu/ome/helpful_hints/test_analysis.pdf

Distractor Analysis Unfortunately, neither item difficulty or item discrimination account for incorrect response options (distracters). Distractor analysis will assist individuals with addressing performance issues associated with incorrect options. On a well-designed multiple choice item, high scoring students/subjects should select the correct option even from highly plausible distractors11. Those who are ill-prepared should select randomly from available distractors. In this scenario, the item would be a good discriminator of knowledge and should be considered for future assessments. In other scenarios, a distractor analysis may reveal an item that was mis-keyed, contained a proofreading error, or contains a distractor that appears plausible even by those that scored well on an assessment.

To be effective incorrect options should be plausible and incorrect without ambiguity. Therefore distractor analysis examines the proportion of students/subjects who selected each of the response options. For the correct response, this proportion is equivalent to the item p-value, or item difficulty12. If all response option proportions are summarized they will add up to 1.0 or 100% of student/subject selections. Reviewing the percentage of students/subjects who have responded to each response option will help you assess if there are issues present in an item’s distractors.

Locating Distractor Statistics To make distractor analysis easier, Remark returns a separate item analysis report specifically for distractor analysis (figure 2). Along with the label, value, weight, and frequency the item was selected, each question item analysis will also include the percent respondents selected an option and its corresponding point biserial correlation for all distractors in addition to the correct answer.

Figure 2: Distractor Analysis

11 Zurawski, Raymond M. Making the Most of Exams: Procedures for Item Analysis. National Teaching & Learning Forum. 7(6). 1998 . <http://www.ntlf.com/html/pi/9811/v7n6smpl.pdf>. 20 September 2007. 12 Step 9. Conduct the Item Analysis. Building High Quality Examination Programs – Professional Testing. 2005. <http://www.proftesting.com/test_topics/steps_9.shtml> 19 September 2007.

Page 7: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 7

Sample Item Analysis13

Good Item P-Value: 0.72 Point Biserial: +0.22

Items Frequency Percent Point Biserial A (correct) 241 72.15 +0.22 B 9 2.70 -0.02 C 3 0.89 -0.10 D 11 3.30 -0.06 E 70 20.96 -0.19 Total 334 100

This item is good because the point biserial correlation for the correct answer is above 0.2 and is higher than the same value for the other distractors.

Fair Item P-Value: 0.39 Point Biserial: +0.12

Items Frequency Percent Point Biserial A 13 3.89 -0.18 B 87 26.05 -0.03 C 40 11.98 -0.10 D (correct) 130 38.92 +0.12 E 64 19.16 +0.05 Total 334 100

While the point biserial correlation for this question is not above the desirable value 0.2, it is close to this value and is higher than the point biserial correlation for all distractors. Students/subjects answering incorrectly selected values from all distractors listed.

Poor Item P-Value: 0.34 Point Biserial: -0.07

13 What Item Analysis Can Tell Us About Item Quality. Measurement Research. <http://measurementresearch.com/media/itemanalysis.pdf>. 20 September 2007.

Page 8: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 8

Items Frequency Percent Point Biserial A 2 0.60 +0.06 B 2 0.60 -0.16 C (correct) 113 33.83 -0.07 D 23 6.89 -0.02 E 194 58.08 +0.09 Total 334 100

The point biserial correlation for this item is negative and lower than some of the other distractors identifying this as a poor question that should be evaluated for revision.

Factors May Impact Results Despite the availability of numerous statistics there are still several factors that could negatively impact your results or your ability to work with student scores.

Incorrect or Missing Identification Numbers Even though students at the School of Pharmacy have been informed to bring ID cards to all assessments and where they may locate the UMB One Card ID number (figure 3), many students still neglect to write any ID number or the correct ID number on their test sheet. In cases where faculty members subtract points from assessments missing required items, students may write a fabricated number or another ID number if they cannot locate or remember the UMB One Card ID number.

Figure 3: UMB One Card ID Number

In other cases, students may write their ID number on the test sheet without filling out the associated optical marks (figure 4). Any test sheet missing this computer-readable information will appear to have a missing ID number in the results, thus making it almost impossible to assign a grade without rescoring the problem test sheet. In these cases, operators processing test sheets will make a good faith effort to fill-in this information. However, faculty members and teaching assistants should make every effort to reinforce good test-taking practices in students/subjects taking assessments and not rely on this service due to its time consuming nature.

Page 9: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 9

Figure 4: Missing Optical Marks

Extraneous Item Information While blue and black ink are acceptable choices of writing implements to ensure recognition of optical marks on test sheets, they do not provide students with an opportunity to make corrections. Several students/subjects using pen instead of pencil have made efforts to make corrections on submitted test sheets resulting in extraneous information (figure 5).

Figure 5: Extraneous Response

However, even in instances where students use pencil incomplete erasures also can cause significant problems when read. Make sure any changes you make to your test sheet are erased completely. Just like “crossed off” answers, incomplete erasures will also read as an answer (figure 6).

Figure 6: Incomplete Erasures

In the event of a single response exam the first item “scored” may be read as the student/subject’s selected option therefore resulting in a higher/lower score for that assessment and statistics that do not reveal the full intention of the student. Operators processing test sheets will attempt to resolve these errors within the Remark Software after tests are scored. Again, faculty members should make every effort to reinforce good test-taking practices in students/subjects taking assessments and not rely on this service due to its time consuming nature.

Page 10: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 10

Test Scores and Blackboard Unfortunately, once examination scores are returned grades must still be distributed to students. There is no mechanism in place to automate the entire process for a faculty member or teaching assistant. However, Computer and Network Services has created a new utility that endeavors to make this process less cumbersome.

During the past few weeks I have been working closely with several faculty members to monitor and assess the adoption of the UMB One Card ID number as a replacement for the former Student Affairs issued PIN numbers. While faculty have reported the addition of the UMB One Card ID number in the grade book is helpful where this information is located in grade spreadsheets downloaded from Blackboard is less than ideal.

The Blackboard Gradebook and Excel Many faculty members who administer paper-based multiple choice assessments and the Blackboard gradebook make heavy use of Microsoft Excel to make grade management easier. Unfortunately, the Excel version of the Blackboard gradebook includes all identifying information for a student in one column (figure 3).

Figure 7: Blackboard Gradebook Detail

Since the last name always appears first in this list, it is the only student identifier that can be used to sort. This means it has been very difficult for faculty members wishing combine the Blackboard gradebook with scanned test results.

Since this feature of Blackboard cannot be changed, Computer and Network Services has worked to construct a solution that would not only provide faculty with a way to sort the Blackboard gradebook by the UMB One Card ID number, but to also provide a student’s campus location, and class year.

The Manage Blackboard Gradebook Utility Located in the Pharmacy Portal, the Manage Blackboard Gradebook Utility (figure 4) takes a Blackboard generated gradebook and adds the UMB One Card ID number, student campus location (PharmD UMB, PharmD USG, or UMB Other), and class year (for identified PharmD students). Instead of the original gradebook information (figure 1), faculty will have student campus, class year, and UMB One Card ID added in the 2nd, 3rd, and 4th gradebook columns (figure 5).

Figure 8: Manage Blackboard Gradebook Utility

Page 11: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 11

Figure 9: New Gradebook Detail

Importing and Exporting Grades in Blackboard While many faculty members and teaching assistants are familiar with the import and export gradebook feature of Blackboard, some individuals may be unfamiliar with this functionality. Individuals wishing to learn more about this feature of Blackboard can access the Entering Grades with Excel module of Getting started with Blackboard <https://rxsecure.umaryland.edu/apps/training/bbnew/> for more information on how this process works.

Questions Any additional questions concerning the University of Maryland Baltimore test scoring service may be directed to Shannon Tucker, the School of Pharmacy liaison to the UMB Center for Information Technology Services for Test Scoring.

Shannon Tucker Director of Instructional Technology University of Maryland School of Pharmacy [email protected]

Appendix 1. School of Pharmacy Test Scoring Cover Page

Page 12: Deconstructing Paper-based Test Scoring and Test Test Sheets Results Received ... and the Coefficient Alpha best measures surveys or attitude data. Interpreting ... Journal of Educational

S. Tucker (9/21/2007) University of Maryland School of Pharmacy ©2007 Page | 12

University of Maryland School of Pharmacy Testing Submittal Sheet All assessments scored by the University of Maryland Baltimore Test Scoring Service for the School of Pharmacy will receive a standard set of reports for the school. If you have questions about the test scoring services for the School of Pharmacy or reports you receive please contact [email protected] for assistance.

Check List: Attached: Test Key Test Sheets

Name of Faculty Member: ______________________________________________________________

Name of Submitter (if applicable): ________________________________________________

Name of Assessment: __________________________________________________________________

Assessment Type: Single Response Multiple Response

Item Scoring: 1 Point Per Item Other (please attach score variations with Key)

Course Prefix (if applicable):

PHAR PHMY PHEX PHPC PHSR OTHER __________

Course Number: ______________________________

Date:

Telephone Extension:

Results Received: Printout and CD

For Official Use Only Comments:

Initial: _____