time course assessment

1. Assessment in HealthcareMoshe Feldman, PhDAssistant Professor, Office of Assessment andEvaluation Studies, VCU School of MedicineOctober 16, 2014

2. Objectives Describe the purpose and methods ofassessment used in healthcare. Identify potential threats to validity anddescribe their impact on assessmentinferences. Compare and contrast the use ofreliability and generalizability forevaluating an assessment. Describe standard setting methods. 3. AssessmentStandards for Educational andPsychological Testing, 1999:Any systematic method of obtaininginformation from tests and othersources, used to draw inference aboutcharacteristics of people, object, orprograms 4. Assessment in HealthcareWhy do we need to Assess ? Feedback on learning and skillacquisition Inferences about differences betweenindividuals Inferences about future performanceor behavior 5. Millers PyramidDoes(ActionShow How(PerformanceKnows How(Competence)Knows(Knowledge) 6. Anatomy of an AssessmentContent: What specific domain are youmeasuring?Modality: How will the domain bemeasured?Scale: How will the domain bequantified? 7. Choosing an AssessmentStrategy1. What KSAswill youmeasure?2. Who willyou assess?3. What willyou do withassessmentdata?4. How willyou assess? 8. 1. What KSAs will youmeasure? Clinical skills Medical knowledge Technical skills Teamwork Skills Interpersonalcommunication Etc1. What KSAs will youmeasure?2. Who will youassess?3. What will youdo withassessmentdata?4. How will youassess? 9. 2. Who will you assess ? Identify what type and levelof learner you would like toassess. Specify if the assessmenttool is intended for use withmultiple levels of learners. M4 medical students M3 & M4 medical students Interprofessional teams ofnursing, pharmacy, andmedical students.1. What KSAswill youmeasure?2. Who willyou assess?3. What will youdo withassessmentdata?4. How will youassess? 10. 3. What will you do with theassessment data? Formative assessmentreviewed by facilitator to help identify areasfor feedback during the debrief Summative assessmentused for 10% of final clerkship gradeused for a competency based assessmentportfolioall students must pass with a minimumscore of 10. Failing students will receiveremediation1. What KSAswill youmeasure?2. Who willyou assess?3. What willyou do withassessmentdata?4. How willyou assess? 11. How is Assessment used inHealthcare Formative Assessment: provide usefulfeedback to student to help monitorprogress in development of expertise Summative Assessment: provide amore formal judgment about a studentlevel of expertise at an importantmilestone of development. 12. 4. How will you assess? Option 1:Use an existingmeasure Option 2:Develop you ownmeasure1. What KSAswill youmeasure?2. Who will youassess?3. What will youdo withassessmentdata?4. How willyouassess? 13. Option 1: Select an existingmeasure.1. Collect available measures Literature search for similar measures(PubMed, MedEdPortal) Ask around. Colleagues or message boardsmay be useful for finding readily availablemeasures.2. Evaluate validity of measures.3. Select final set of measures.4. Evaluate feasibility, preferred scoringformat, and assessment resourcesavailable.5. Select final measure. 14. Option II. Develop your ownscale Time and resource intensive Starting from scratch and validity 15. Assessment Scale Assessment Method Written test (MCAT) Performance test (e.g. simulation) Observation of clinical performance (e.g.mini-CEX, 360 evaluation) Other (e.g. TBL, essay, Assessment Scale Nominal Ordinal Interval Ratio 16. VALIDITY 17. Validity (Standards, 1999)Validity refers to the degree to whichevidence and theory support theinterpretations of test scores entailed byproposed uses of tests 18. A word about Validity validity is an inference about themeaning of assessment scores. an assessment is never valid orinvalid. validity evidence strengthens orweakens confidence in theinterpretation of assessment scores. 19. Assessment as an inferenceCurrent StateFuture StateAssessment Written tests Oral exam Simulation On the job ratings Etc.Assessment Written tests Oral exam Simulation On the job ratings Etc. 20. Validity Evidence Content Response process Internal structure Relations to other variables Consequences of testing 21. Types of Validity Evidence Content Do the assessment items represent thetargeted KSA? Response process Was the assessment applied in anappropriate and standardized way? Internal structure Is the assessment psychometricallysound? Is it reliable? 22. Types of Validity Evidence Relationship to other variables Is there construct validity? Does the assessment correlate more withother assessments of similar skills andcorrelate less with unrelated skills? Consequences Is the assessment being used in a fairway? Are scores being interpreted and appliedcorrectly? High stakes decisions 23. The validity statement Teamwork skills This assessment measurescompetency in teamwork skills at thelevel of 3rd year internal medicineresidents. Assessment data will be usedto evaluate minimum competency andfor guiding feedback after thesimulation 24. Threats to validity Construct underrepresentation Construct irrelevant variance 25. Rating Process is complex1.Observe2.Encode3.Store4.Retrieve5.Judgment6.EVALUATERaterdifferencesMultiple rightwaysCognitive errors(e.g. memory)Errors in judgmentIndividual orcultural biases 26. What do you see ? 27. Rater ErRrorater EDesrcrripotiorns ConsequenceCentral Tendency Avoiding extreme positive ornegative ratings.Reduces ability to discriminateperformanceHalo Error All ratings based on onevery positive or negativeobservation.Positively or negatively skewed ratings.Primacy/RecencyEffectAll ratings based onobservations made early orlate in the scenario.Positively or negatively skewed ratings.Contrast Effect Ratings are made relative toperformance of previousgroupPositively skewed ratings when priorgroup performed very poorly, Negativelyskewed ratings when prior groupperformed very well.Potential Effect Ratings based onperceptions of futurepotentialUsually positively skewed ratings.Similar-to-meeffectRatings based on degree ofsimilarity to raterTendency to rate people who resemblethemselves higher.Stereotype Effect Ratings based on groupinclusion rather thanindividual differencesPositively or negatively biased ratings forsome groups. 28. Activity: Evaluating Validity Case examples (written test vsperformance test vs clinical ratings) Describe validity evidence you have Identify potential threats to validity Discuss alternative ways to strengthenvalidity 29. RELIABILITY &GENERALIZABILITY 30. Measures of reliability are an estimate In the weeds of statistical theoryTRUESCOREERROR But an important estimate!OBSERVEDSCOREWhat is Reliability? 31. Reliability and Accuracy Reliability refers to the consistency anddependability of our ratings Accuracy and Validity refer to whether weare truly measuring what we think 32. Why Reliability Matters Consistently measuring what you thinkyou are measuring Influences diagnostic value of feedback Influences validity of your measure How much your simulation scores are actuallyrelated to performance in the real world Reliability caps validity Reliability directly affects the value of theconclusions you can draw from your data 33. How to Assess Reliability Agreement Use if your ratings are categorical CorrelationAction/Behavior Use if your ratings are continuousYes NoAssess AirwayBreathingCirculation/FAST Exam1. The Team Leader assigned roles to the Trauma TeamIneffective1 2 3 4 56VeryEffective 34. How to Assess Reliability Over time Same observer scores the same videotaped scenario at two different points intime Across multiple raters Two different observers score the samescenario In real time Video 35. How to Assess Reliability: Agreement Percent Agreement Agreements/(Agreements+Disagreements)X 100 = % agreementNumber of times observers agreeNumber of opportunities to agree 36. Yes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and Environment4 Opportunities toYes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentObserver#1AgreeObserver#2 37. Yes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentYes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentObserver#1Observer#2 38. Yes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentYes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentObserver#1Observer#2 39. Yes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentYes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentObserver#1Observer#2 40. Yes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentYes NoBreathingAction/BehaviorCirculation/FAST ExamDisabilityExposure and EnvironmentObserver#1Observer#2 41. Percent AgreementNumber of times observers agreeNumber of opportunities to agreeX 100 = % agreement34X 100 = 75% agreement 42. Improving reliability Add additional test items Conduct an item analysis and reviseitems Create a composite measure withother assessments measuring thesame KSA 43. Generalizability Theory Classical test theoryObserved score = True score + error Generalizability theoryObserved score = True score + error1 +error2 + error3.G Theory allows specific estimatesfor multiple sources of error. 44. Generalizability Example OSCER1 R2Scenario 122 y/o MPhysicalexaminationskills Medicalknowledge Interpersonal skillsR3 R4Scenario 265 y/o FPhysicalexaminationskillsMedicalknowledge teamworkR5 R6Scenario 3PhysicalexaminationskillsMedicalknowledge Interpersonal skills 45. G StudyStep 1: Define your model Objective of measurement: Medical student Conditions of measurement (e.g.facets) Raters Scenarios Random (sample of population) orfixed (represents entire population) 46. G Study Estimate variance Analysis of Variance approach Estimates variance in performanceaccounted for by each facet Estimates amount of error Goal is to maximize amount of totalvariance accounted for by assesse orobject of measurement and minimizevariance due to facets 47. Decision Study Step II: Conduct a decision study Decision study allows you to account forvariance components for alternativedesigns How would variance and error changeif. I added or removed raters? I added or removed stations? 48. Decision study G coefficient = how consistently anexam will rank examinees (normreferenced) Phi coefficient = reliability of absolutemagnitude (criterion referenced) 49. SETTING STANDARDS 50. Standard Setting Identify individuals performing belowand above an expected minimum levelof skill. There is no right choice for settingstandards, it is ultimately a policydecision based on local contexts. Standard setting process allow formore consequentially valid anddefensible decisions based onassessment scores. 51. Standard Setting Approaches Normative based Relative to the group Criterion based Relative to a specific level of skill or ability Mastery learning 52. Eight Steps for StandardSetting1. Select a method2. Select judges3. Prepare descriptions of performance4. Train judges5. Collect ratings/judgements6. Provide feedback and facilitatediscussion7. Evaluate the standard setting process8. Provide results and validity evidence todecision makers 53. Step I. Select a method Test based Make judgments about the expected difficultyand level of performance on test items Examinee based Make judgment on acceptable on individualscompetency based on their observableperformance or test scores Compromise or hybrid Combine test based and examinee basedapproaches Incorporates acceptable passing and fail rates 54. Test based example Angoff Method Discuss characteristics of a borderlinestudent Consensus on characteristics ofborderline student Review each item and estimatepercentage of borderline students gettingitem correct Judgments are averaged to determine apassing score on a test. Average percentdefines the borderline student andpassing score is selected. 55. Examinee Based Example Contrasting Groups Method Judges review performance of examinees Judges provide a global pass or failjudgment for each examinee based onperformance scores Passing score is set at point of highestdisagreement between fail/pass decisions Minimizes false categorization errors andmaximizes discrimination between expertsand novices within a group. 56. Compromise or HybridExample 57. Issues to consider when settingstandards Resources available Expert judges Time Assessment type Formative Summative 58. Questions ?

time course assessment

Documents

validity validity

assessment measurescompetency

assessment items

assessment resourcesavailable

validity of measures

validity anddescribe

validity standards

ksaswill youmeasure