measurement characteristics error & confidence reliability, validity, & usability

27
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

Upload: lisa-geraldine-stone

Post on 28-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

MEASUREMENT CHARACTERISTICS

Error & ConfidenceReliability, Validity, & Usability

Page 2: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

ERROR & CONFIDENCE

Reducing error All assessment scores have error Want to minimize so scores are accurate Protocols & periodic staff training/retraining

Increasing confidence Results lead to correct placement Assessments that produce valid, reliable,

and usable results

Page 3: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

ASSESSMENT RESULTS

Norm-referenced Individual’s score compared to others

in their peer/norm group School tests, 95%

Norm group needs to be representative of test takers the test was designed for

Page 4: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

ASSESSMENT RESULTS

Criterion-referenced Individual’s score compared to a

preset standard or criterion Standard doesn’t change based on

the individual or group A=250-295 points

Page 5: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

VALIDITY

Describes how well the assessment results match their intended purposeAre you measuring what you think you are measuring?Relationship between program & assessment contentDoes not have validity for all purposes, populations or time

Page 6: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

VALIDITY

Depends on different types of evidenceIs a matter of degree (no tool is perfect)Is a unitary concept Change from past Former types are now considered as

evidence Content validity/content-related evidence

Page 7: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

FACE VALIDITY

Not listed in textDo the items seem to fit?

Page 8: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CONTENT VALIDITY(Content-related evidence)

How well does assessment measure subject or content?RepresentativeCompleteness----all major areasNonstatisticalReview of literature or expert opinionBlueprint of major componentsPer Austin (1991), minimum requirement for any assessment

Page 9: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CRITERION-RELATED VALIDITY (Criterion-related evidence)

Comparison of resultsStatisticalReported as validity or correlation coefficient+1 to -1 (1 is a perfect relationship)0 = no relationshipr.73 better than r.52r +/-.40 to +/-.70 = acceptable range

Page 10: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CRITERION-RELATED VALIDITY (Criterion-related evidence)

May use .30 to .40 if statistically significantIf validity is reported, it is generally criterion-related validity2 types Predictive Concurrent

Page 11: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

PREDICTIVE VALIDITY

The ability of an assessment to predict future behaviors or outcomesMeasures are taken at different times ACT or SAT & success in college Leisure Satisfaction predicts

discharge

Page 12: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CONCURRENT VALIDITY

More than one instrument measures the same contentDesire to predict 1 set of scores from another set of scores that are taken at the same or nearly same time measuring the same variable

Page 13: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CONSTRUCT VALIDITY(Construct-related evidence)

Theoretical/conceptualContent & criterion-related validity contribute to construct validityResearch concerning conceptual framework on which assessment is based contribute to construct validityNot demonstrated in a single project or statistical measureFew TR have: focus = behavior not construct

Page 14: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

CONSTRUCT VALIDITY(Construct-related evidence)

Factor analysisConvergent validity (what it measures)Divergent validity (what it doesn’t measure)Expert panels here too

Page 15: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

THREATS TO VALIDITY

Assessment s/b valid for intended use (e.g. research instruments)Unclear directionsUnclear or ambiguous termsItems that are at inappropriate level for subjectsItems not related to construct being measured

Page 16: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

THREATS TO VALIDITY

Too few itemsToo many itemsItems with an identifiable pattern of responseMethod of administrationTesting conditionsSubjects health, reluctance, attitudesSee Stumbo, 2002, p.41-42

Page 17: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

VALIDITY

Can’t get valid results without reliable results, but can get reliable results without valid resultsReliability is a necessary but not sufficient condition for validitySee Stumbo, 2002, p. 54

Page 18: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

RELIABLITY

Accuracy or consistency of a measurementReproducible resultsStatistical in naturer = between 0 & 1 (with 1 being perfect)Should not be lower than .80Tells what portion of variance is non-error varianceIncreases with length of test & spread of scores

Page 19: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

STABILITY (Test-retest)

How stable is the assessment?Assessment not overly influenced by passage of timeSame group assessed 2 times with same instrument & results of the 2 testings are correlatedAre the 2 sets of scores alike?Time effects (longer, shorter)

Page 20: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

EQUIVALENCY (Equivalent forms)

Also known as parallel-form or alternative-form reliabilityHow closely correlated are 2 or more forms of the same assessment?2 forms have been developed and demonstrated to measure the same constructForms have similar but not same itemse.g. NCTRC examShort & long forms are not equivalent

Page 21: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

INTERNAL CONSISTENCY

How closely are items on the assessment related?Split half 1st half vs. 2nd half Odd/even Matched random subsets

If can’t divide Cronbach’s alpha Kuder-Richardson Spearman-Brown’s formula

Page 22: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

INTERRATER RELIABILITY

Percentage of agreements with number of observationsDifference between agreement & accuracyRaters compared to each other80% agreement

Page 23: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

INTERRATER RELIABILITY

Simple agreement Number of agreements & disagreements

Point-to-point agreement Takes each data point into consideration

Percentages of agreement for the occurrence of target behaviorKappa index

Page 24: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

INTRARATER RELIABILITY

Not in textCompared with self

Page 25: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

RELIABILITY

Manuals often give this informationHigh reliability doesn’t indicate validityGenerally a longer test has higher reliability Lessens influence of chance or

guessing

Page 26: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

FAIRNESS

Reduction or elimination of undue bias Language Ethnic or racial backgrounds Gender Free of stereotypes & biases

Beginning to be a concern for TR

Page 27: MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability

USABILITY & PRACTICALITY

NonstatisticalIs this tool better than any other tool on market or one I can design?Time, cost, staff qualifications, ease of administration, scoring, etc