reliability chapter 3. every observed score is a combination of true score and error obs. = t + e ...
TRANSCRIPT
ReliabilityChapter 3
Every observed score is a combination of true score and error
Obs. = T + E
Reliability =
Classical Test Theory
ss
ss
O
T
O
E2
2
2
2
1
Systematic versus unsystematic error
Reliability only takes unsystematic error into account
Reliability
Reliability & Correlation
Reliability often based on consistency between two sets of scores
Correlation: Statistical technique used to examine consistency
Positive Correlation
Negative Correlation
Correlation coefficient: a numerical indicator of the relationship between two sets of data
Pearson-Product Moment correlation coefficient is most common
Pearson-Product MomentCorrelation Coefficient
r
1z 2zN
The percentage of shared variance between two sets of data
Coefficient of Determination
Test-Retest
Alternate/Parallel Forms
Internal Consistency Measures
Types of Reliability
Correlating performance on first administration with performance on the second
Co-efficient of stability
Test-Retest
Two forms of instrument, administered to same individuals
Alternate/Parallel Forms
Split-half reliability Spearman-Brown formula
Kuder-Richardson formulas KR 20 KR 21
Coefficient Alpha
Internal Consistency Measures
Typical methods for determining reliability may not be suitable for:
Speed tests
Criterion-referenced tests
Subjectively-scored instruments Interrater reliability
Nontypical Situations
Examine purpose for using instrument
Be knowledgeable about reliability coefficients of other instruments in that area
Examine characteristics of particular clients against reliability coefficients
Coefficients may vary based on SES, age, culture/ethnicity, etc.
Evaluating Reliability Coefficients
rsSEM 1
Standard Error of Measurement
Provides estimate of range of scores if someone were to take instrument repeatedly
Based on premise that when individuals take a test multiple times, scores fall into normal distribution
Sam’s SAT Verbal = 550 r = .91; s = 100
SEM
68% of the time, Sam’s true score would fall between 520 and 580
95% of the time, Sam’s true score would fall between 490 and 610 99.5% of the time, Sam’s true score would fall between 460 and
640
SEM: Example
30
3.100
09.100
91.1100
Determining Range of Scores Using SEM
Method to determine if difference between two scores is significant
Takes into account SEM of both scores
Standard Error of Difference
Generalizability or Domain Sampling Theory
Focus is on estimating the extent to which specific sources of variation under defined conditions are contributing to the score on the instrument
Alternative Theoretical Model