lecture 06b begins here this is where material for exam 3 begins

30
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Upload: claude-joseph

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

LECTURE 06B BEGINS HERETHIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Page 2: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

RIGOR OF ASSESSMENT IN NORM-REFERENCED TESTING(HUTCHINSON, 1996)

Page 3: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

RIGOR OF ASSESSMENT (PART OF ASSESSING PSYCHOMETRIC ADEQUACY)Validity Extent which a procedure actually measures what it claims to measure

Reliability Consistency of response/performance elicitation

Remember: Can be applied to both norm-referenced and criterion referenced testing

Page 4: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

RIGOR OF ASSESSMENT IN NORM-REFERENCED TESTING:SUBTOPIC = VALIDITY

Page 5: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

ASSESSING VALIDITY IN NORM-REFERENCED TESTINGDefinition of and evidence for validityExtent which a procedure actually measures what it is supposed to measure

Defined relative to a specific purpose E.g. valid for screening, but not valid for Tx planning

Issue of the quality and extent of available evidenceLogical analysisEmpirical data

Page 6: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

TYPES OF VALIDITY (H&P, 2012)

Construct validity“Degree to which a test measures the theoretical construct it is intended to measure”

Page 7: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Content validityDegree to which the content of a test is consistent with the purpose of a test

--appropriateness of items--completeness of the item sample--the way in which the items assess the content

Cf. face validity, which has surface appearance of content validity

TYPES OF VALIDITY (H&P, 2012)

Page 8: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Criterion-related validity Degree to which the test performance predict performance on other (external) criteria--subtype = predictive

Ability to predict score on future test in related area

--subtype = concurrent compared to present performance on other tests in related area

TYPES OF VALIDITY (H&P, 2012)

Page 9: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

SOURCES OF EVIDENCE OF VALIDITY, (HUTCHINSON, 1996)

Evidence used to support the argument that a test is valid for its stated purposeFirst source category = Logical evidenceTest’s purpose well statedConstruct (theory/framework) well definedGood rationale for content of the test, which includes documentation that both easy and hard test items have been included, to discriminate disorder

Key concept: Are the test authors’ logically-based arguments convincing?

Page 10: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

SOURCES OF EVIDENCE OF VALIDITY, (HUTCHINSON, 1996)

Evidence used to support the argument that a test is valid for its stated purpose Second source category = Empirical evidence

Correlation (r), a measure of relationship between ____________________ and _____________________

Good prediction of group membership with measures of __________________ and _____________________

Pattern of relationship among sub-test results should match the pattern predicted by the constructVia correlationVia factor analysisKey concept: Are the test authors’

empirically-based arguments convincing?

Page 11: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

What are the labels on the axes when one uses correlation as evidence for validity?

Empirical evidence for validity, using correlation…Measure of relationship between _____________ and ____________

Page 12: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Empirical evidence for validity, using correlation…Measure of relationship between _____________ and ____________

Is the test authors’ empirical argument convincing?

What evidence is given to describe the relationship between the test of interest and others considered to be similar?

Note that valid tests should also have low correlations with test measuring different parameters

Page 13: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Sensitivity--the test’s accuracy in correctly identifying the clients WITH the disorder

Specificity-- the test’s accuracy in correctly identifying the clients WITHOUT the disorder

Empirical evidence for validity, using measures of sensitivity and specificity…

Page 14: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Empirical evidence for validity, using measures of sensitivity and specificity… Let’s “visualize” these concepts

Page 15: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Empirical evidence for validity, using measures of sensitivity and specificity… In the test manual, we’re looking for reports of high specificity and high sensitivity. Is the test authors’

empirical argument convincing?

What evidence is given to support the accuracy of this test in classifying subjects into already-established performance categories?

Do you see how this type of evidence for validity is directly related to the purpose of norm-referenced tests?

Page 16: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Empirical evidence for validity, using patterns of correlations among subtests, to see if the patterns fit what the construct would predict (construct in this example = what makes up writing ability?)

Is the test authors’ empirical argument convincing?

What statistical data support the relationship among separate components of the test or their relationship with the overall contruct?

Page 17: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Empirical evidence for validity, using factor analysis of sub-test scores, e.g. to see if patterns of factor loadings follow what the construct of writing ability would predict

I: “Writer’s development of the work”II: “Writer’s fluency with mechanics”III: “Sentence structure”IV: “Writer’s orientation to the reader”

Is the test authors’ empirical argument convincing?

Page 18: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

RIGOR OF ASSESSMENT IN NORM-REFERENCED TESTING:SUBTOPIC = RELIABILITY

Page 19: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Reliability Consistency of response/performance elicitation (includes consistency of scoring and measurement)

Remember….

Page 20: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

TYPES OF RELIABILITY, AND EVIDENCE FOR THEMAgreement OR Inter-rater reliabilityCorrelation of scores of two raters (good = .85-.90)*

Item by item or total score

Stability OR Test-retest reliabilityCorrelation of scores from two separate test administrations with same person, across testees (good = .85-.90)* (continued….)

Can you see why the authors should optimally provide reliability scores for: 1) each age group separately? 2) both normal and disordered groups?

Page 21: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

TYPES OF RELIABILITY, AND EVIDENCE FOR THEM (CONT.)Internal consistency OR split-half

reliabilitySplit test in two halves and obtain correlation between the two sets: Measured as rE.g. Split top from bottomE.g. split even items from odd items

Test items assigned to two halves through random assignment, and obtain r. Then do this again, and again, and again….. “Average” all the r’s = Cronbach’s coefficient alpha

Page 22: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

What are the labels on the axes when one uses correlation as evidence for --inter-rater reliability?--test/retest reliability? --split half reliability?

Empirical evidence for reliability, using patterns of correlations…

Page 23: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

Think:

Even when a test is very carefully designed and reliable (consistent) in its ability to measure a construct (e.g. narrative comprehension), a client’s responses to test items may not always reflect a true picture of his underlying ability (e.g. his true ability to understand narrative passages).

Error in measurement cannot be avoided, especially when measuring human performance. Even with the most reliable test, what are some of the other factors that affect a client’s performance on a test, on a given day?

Transition slide from topic of reliability to topic of Standard Error of Measurement (SEM)

Observed score = the actual raw score that a test-taker earnsTrue score = hypothetical “ideal” score that the person would have earned if there were no error in measurement

Page 24: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

STANDARD ERROR OF MEASUREMENT SEM

If a person took a test 100 times, their scores:

1) would tend to fall near some central score (represented by a measure of central tendency, such as the average), e.g. 42

2) would deviate from the central score (due to error of measurement) in predictable way, with most of them not too far from the center

The “average deviation” (or “average distance”) from the central score is known as the standard deviation, e.g. 2.

This standard deviation (“average deviation”) due to error of measurement is called the standard error of measurement (SEM), e.g. 2 away from 42 (either above or blow)

Num

ber

of ti

mes

th

e pe

rson

ear

ned

the

scor

e

few

many

Score42 4440 ____

Can you fill in the values that would be two SEM away from the average?

Page 25: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

STANDARD ERROR OF MEASUREMENT SEM

Now, test-makers don’t really calculate SEM by giving people a test 100 times! They calculate SEM using:

1)estimates of the test’s reliability (at least one of the three types)

2)the distribution of scores earned by the normative sample

3)the way in which reliability varies at different score levels

SO, clinicians don’t calculate SEM. SEM is provided in the test manual to help guide us in our interpretation of a client’s score.

Num

ber

of ti

mes

th

e pe

rson

ear

ned

the

scor

e

few

many

Score42 4440 ____

Can you fill in the values that would be two SEM away from the average?

Page 26: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

STANDARD ERROR OF MEASUREMENT SEM

68% of the scores would be predicted to fall within one SEM of the average

e.g. we could predict that 68/100 would fall between 40 and 44

95% of the scores would be predicted to fall within two SEMs of the average

e.g. we could predict that 95/100 would fall between ____ and ____

Num

ber

of ti

mes

th

e pe

rson

ear

ned

the

scor

e

few

many

Score42 4440 ____

Page 27: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

SEM AND ITS RELATIONSHIP TO CONFIDENCE INTERVALS (See Hutchinson and H&P readings)

Observed score The actual raw score that the test taker earns

True score The score that the person would have earned if there were no measurement error

Page 28: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

SEM AND ITS RELATIONSHIP TO CONFIDENCE INTERVALS (See Hutchinson and H&P readings)

+ 1 SEM to -1 SEM = 68% confidence interval. We can have 68% confidence that the client’s true score would fall somewhere in this range

+ 2 SEM to -2 SEM = 95% confidence interval. We can have 95% confidence that the client’s true score would fall somewhere in this range

Page 29: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

INTERPRETATION OF CONFIDENCE INTERVAL RELATIVE TO CUT-OFF SCORE

How do we interpret performance when confidence interval :

a)is completely above the cut-off score?

b)is completely below the cut-off score?

c)straddles the cut-off score?

Page 30: LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

LECTURE 06B ENDS HERE