validity/reliability matters

35
Validity/Reliability Matters Really? Beverly Mitchell, Kennesaw State University

Upload: kadeem-hansen

Post on 01-Jan-2016

49 views

Category:

Documents


4 download

DESCRIPTION

Validity/Reliability Matters. Really?. Can a test be valid and not be reliable?. Can a test be reliable and not be valid?. Justifiable Relevant True to its purpose (consistently). Validity. Validity. Design Issues Application Issues. Validity. Design Issues Application Issues. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Validity/Reliability Matters

Validity/Reliability MattersReally?

Beverly Mitchell, Kennesaw State University

Page 2: Validity/Reliability Matters

Can a test be valid and not be reliable?

Beverly Mitchell, Kennesaw State University

Page 3: Validity/Reliability Matters

Can a test be reliable and not be valid?

Beverly Mitchell, Kennesaw State University

Page 4: Validity/Reliability Matters

JustifiableRelevant

True to its purpose(consistently)

Validity

Beverly Mitchell, Kennesaw State University

Page 5: Validity/Reliability Matters

Validity

Design Issues

Application Issues

Beverly Mitchell, Kennesaw State University

Page 6: Validity/Reliability Matters

Validity

Design IssuesApplication Issues

Beverly Mitchell, Kennesaw State University

Page 7: Validity/Reliability Matters

Design: Creating the Instrument

1-Inference 2-Complexity

Beverly Mitchell, Kennesaw State University

Page 8: Validity/Reliability Matters

Inference

Low High

Beverly Mitchell, Kennesaw State University

Page 9: Validity/Reliability Matters

HighInference

To draw a conclusion

To guess, surmise

To suggest, hint

Beverly Mitchell, Kennesaw State University

Page 10: Validity/Reliability Matters

LowInferenceStraightforward

Language = precise & targeted

Clear – no competing interpretations of words

No doubt as to what point is being made

Beverly Mitchell, Kennesaw State University

Page 11: Validity/Reliability Matters

Inference

Low High

Beverly Mitchell, Kennesaw State University

Page 12: Validity/Reliability Matters

Complexity

Low High

Beverly Mitchell, Kennesaw State University

Page 13: Validity/Reliability Matters

HighComplexity

Complicated

Comprised of interrelated parts or sections

Developed with great care or with much detail

Beverly Mitchell, Kennesaw State University

Page 14: Validity/Reliability Matters

LowComplexity

Simplistic

Plain

Unsophisticated

Beverly Mitchell, Kennesaw State University

Page 15: Validity/Reliability Matters

Complexity

Low High

Beverly Mitchell, Kennesaw State University

Page 16: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

How They Are Related

Beverly Mitchell, Kennesaw State University

Page 17: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

Designing the Instrument

Beverly Mitchell, Kennesaw State University

Page 18: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

Due “Yesterday”!

Beverly Mitchell, Kennesaw State University

Page 19: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

“Overachieving”

Beverly Mitchell, Kennesaw State University

Page 20: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

How Much Error Are You Willing to Risk?

Error

Error

Beverly Mitchell, Kennesaw State University

Page 21: Validity/Reliability Matters

Low HighInference

Complexity

Low

High

Compromise

Beverly Mitchell, Kennesaw State University

Page 22: Validity/Reliability Matters

Does the OBSERVED Behavior = TrueBehavior?

Observed SCORE ≠ TRUE SCORE

E R R O R

Beverly Mitchell, Kennesaw State University

Page 23: Validity/Reliability Matters

Design: Creating the Instrument

1-Inference

General Rubric - high

Qualitative analytic rubric – low

2-Complexity

Easy to develop – question worthiness, guidance, single interpretation - low

Time to develop – labor intensive, onerous, long - high

Beverly Mitchell, Kennesaw State University

Page 24: Validity/Reliability Matters

Validity

Design Issues

Application Issues

Beverly Mitchell, Kennesaw State University

Page 25: Validity/Reliability Matters

Application Issues

Designated Use

Limitations/Conditions

Beverly Mitchell, Kennesaw State University

Page 26: Validity/Reliability Matters

Application Issues

Designated UseDon’t borrow from neighbor!

Beverly Mitchell, Kennesaw State University

Page 27: Validity/Reliability Matters

Application Issues

Limitations/ConditionsOne size does not fit all or apply to all circumstances

Beverly Mitchell, Kennesaw State University

Page 28: Validity/Reliability Matters

Ways to Increase Probability for Accuracy

Compare language: standards & concepts

The concepts/expectations in the standards are apparent in the assessments – same depth and breadth

Good example of Content Validity

Behavior (performance) expected in the standard matches the performance expected in the assessment – i.e., knowledge of…demonstrating skill…

Identify Key/Critical items/concepts to evaluate

Give it away for analysis (many eyes)

Invite external “expert” review

Be receptive to feedback

Surveys from P-12 partners, candidates

Regular evaluation and analysis: revise, revise, revise

Awareness of design and application issues

Beverly Mitchell, Kennesaw State University

Page 29: Validity/Reliability Matters

Ways to Increase Reliability

Begin with a valid instrument Two reliability issues:

Reliability of the instrument: repeated use of instrument by same evaluators

If problematic: revise, re-think, abandon Reliability of the scoring: performance rated same by

different evaluators, i.e., objectivity If problematic: ensure qualifications of evaluators, check

rubric, check language, minimize generalized concepts applied to all subject areas

Train evaluators frequently

Beverly Mitchell, Kennesaw State University

Page 30: Validity/Reliability Matters

AN APPLICATION: A KSU Workshop (Handouts Available)

Thirty experienced teachers participated in a daylong workshop to help us evaluate three student teaching observation rating forms.

Beverly Mitchell, Kennesaw State University

Page 31: Validity/Reliability Matters

Three Instruments

Traditional Candidate Performance Instrument (CPI) Observation of Student Teaching. Observer is asked to indicate strengths and weaknesses and areas for improvement in three broad outcomes (Subject matter, Facilitation of Learning, and Collaborative Professional).

Modified CPI Observation of Student Teaching (Observer is asked to explicitly rate each proficiency within each outcome and then provide narrative indicating any strengths, weaknesses, suggestions for improvement.

Formative Analysis Class Keys: Observer is asked to rate 26 elements from Georgia Department of Education’s Class Keys. No required narrative.

Beverly Mitchell, Kennesaw State University

Page 32: Validity/Reliability Matters

Generally we were interested in two areas……………….

Validity/Accuracy – Which instrument provides us the best inference about the present of positive behaviors (proficiencies) we deem important? AND

Reliability/Consistency – Which instrument demonstrates the best inter-rater reliability?

Beverly Mitchell, Kennesaw State University

Page 33: Validity/Reliability Matters

Study Design Instrument Group 1 Group 2 Group 3

Period 1:Traditional CPI-Narrative

Video A Video B Video C

Period 2: Modified CPI Rating and Narrative

Video B Video C Video A

Period 3: Class Key Formative Analysis

Video C Video A Video B

Beverly Mitchell, Kennesaw State University

Page 34: Validity/Reliability Matters

Reliability

Strongest inter-rater agreement between Modified CPI with performance level rating followed by Class Keys Formative Assessment Instrument with a performance level rating.

Very little agreement between behaviors noted in Traditional CPI narratives and no performance level ratings were available. Probably not a reliable instrument for rating student teaching behaviors.

Beverly Mitchell, Kennesaw State University

Page 35: Validity/Reliability Matters

Validity

Both the traditional CPI and Modified CPI are explicitly aligned with institutional (and other) standards but the Traditional CPI is a global assessment and the Modified CPI requires a rating and narrative for each proficiency.

However, the traditional CPI has not demonstrated reliability….so

Participants were also asked to provide information about the language, clarity, ease of use for all instruments.

Beverly Mitchell, Kennesaw State University