pre- to post-test analysis for local assessments

63
PRE- TO POST-TEST ANALYSIS FOR LOCAL ASSESSMENTS Michael Lance Joseph Musial MERA Conference: Fall 2012

Upload: poppy

Post on 24-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Pre- to Post-test Analysis for Local Assessments. Michael Lance Joseph Musial MERA Conference: Fall 2012. Overview. Standards Considerations “Experimental Design” Analysis. Standards for Educational & Psychological Testing (AERA, APA, NCME, 1999). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pre- to Post-test Analysis for Local Assessments

PRE- TO POST-TEST ANALYSIS FOR LOCAL ASSESSMENTS

Michael LanceJoseph MusialMERA Conference: Fall 2012

Page 2: Pre- to Post-test Analysis for Local Assessments

Overview Standards Considerations “Experimental Design” Analysis

Page 3: Pre- to Post-test Analysis for Local Assessments

STANDARDS FOR EDUCATIONAL & PSYCHOLOGICAL TESTING (AERA, APA, NCME, 1999)

The purpose of the Standards “is to provide criteria for the evaluation of tests, testing practices, and the effects of test use...provides a frame of reference to assure that relevant issues are addressed” (p.2).

Page 4: Pre- to Post-test Analysis for Local Assessments

WHAT HAPPENS IF THE STANDARDS ARE IGNORED?

Practice without theory is chaos... (Author unknown)

...and may result in law suits. (Lance & Musial)

Page 5: Pre- to Post-test Analysis for Local Assessments

STANDARD 1.1

“A rationale should be presented for each recommended interpretation and use of test scores, together with a comprehensive summary of the evidence and theory bearing on the intended use or interpretation” (p. 17).

Page 6: Pre- to Post-test Analysis for Local Assessments

STANDARD 1.2

“The test developer should set forth clearly how test scores are intended to be interpreted and used. The population(s) for which a test is appropriate should be clearly delimited, and the construct that the test is intended to assess should be clearly described” (p. 17).

Page 7: Pre- to Post-test Analysis for Local Assessments

LOW-STAKES TESTINGAccording to the Standards, “A low-stakes test...is one administered for informational purposes or for highly tentative judgments such as when test results provide feedback to students, teachers, and parents on student progress during an academic period” (p. 139).

Page 8: Pre- to Post-test Analysis for Local Assessments

HIGH-STAKES TESTING

A high-stakes test is when the “aggregate performance of a sample, or of the entire population of test takers, is used to infer the quality of service provided, and decisions are made about institutional status, rewards or sanctions based on test results” (p. 139).

Page 9: Pre- to Post-test Analysis for Local Assessments

RELIABILITYReliability refers to the consistency of measurements when the testing procedure is repeated on a population of individuals or groups (p. 25)....When test scoring involves a high level of judgment, indexes of scorer consistency are commonly obtained (p. 27).

Page 10: Pre- to Post-test Analysis for Local Assessments

RELIABILITY PRIMERReliability: Pertains to the consistency of scores over time It is reported as a coefficient (r) & ranges 0 – 1.00 It is a function of scores or data on hand Reliability > Validity Low-stakes: r ~ .70 and high-stakes : r ~ .90

Source: Nunnally (1978), Thompson (2003)

Page 11: Pre- to Post-test Analysis for Local Assessments

RELIABILITY IS FREQUENTLY DETERMINED THROUGH CORRELATIONS

* *

Form A * * ** ** *

Form B

Page 12: Pre- to Post-test Analysis for Local Assessments

TYPES OF RELIABILITY COEFFICIENTS:

a. Administration of parallel forms in independent testing sessions (alternate-form coefficient).

b. Administration of the same instrument on separate occasions (test-retest/stability coefficient).

c. Administration from a single test (internal consistency coefficients).

Page 13: Pre- to Post-test Analysis for Local Assessments

VALIDITY“Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests...it is the interpretations of test scores required by proposed uses that are evaluated, not the test itself” (p. 9).

Page 14: Pre- to Post-test Analysis for Local Assessments

STANDARD 3.1

“Tests and testing programs should be developed on a sound scientific basis. Test developers and publishers should compile and document adequate evidence bearing on test development “(p. 43).

Page 15: Pre- to Post-test Analysis for Local Assessments

STANDARD 3.2

“The purpose(s) of the test, definition of the domain, and the test specifications should be stated clearly so that judgments can be made about the appropriateness of the defined domain for the stated purpose(s) of the test and about the relation of items to the dimensions of the domain they are intended to present” (p. 43).

Page 16: Pre- to Post-test Analysis for Local Assessments

Considerations Pedagogical

i.e. assumption of prior knowledge Technical

i.e. mastery learning Logistical

i.e. testing

Page 17: Pre- to Post-test Analysis for Local Assessments

What are the assumptions? Do we assume zero prior knowledge?

How would a group of students perform on a Chinese Language pretest?

What factors influence prior knowledge? Do students receive supplemental enrichment that would impact pre-test? Do students have access to computer-based

support at home/library?

Page 18: Pre- to Post-test Analysis for Local Assessments

Mastery Learning ...leads to an all-or-nothing score, suggesting the

student has or has not met the pre-established mastery level.

“When basic skills are tested, nearly complete mastery is generally expected...80 to 85% correct items....” (Anastasi and Urbina, 1997).

Page 19: Pre- to Post-test Analysis for Local Assessments

Who did better on this 100-item test?

Joe’s pre-test = 20

Michael’s pre-test = 85

Joe’s post-test = 70

Michael’s post-test = 100

Page 20: Pre- to Post-test Analysis for Local Assessments

Who did better on 1-mile run in physical education class?

Page 21: Pre- to Post-test Analysis for Local Assessments

Teacher-made test Referred to as criterion-referenced or domain-

referenced The focus is on what test takers can do and what

they know compared to what is expected In contrast, when a student is compared to

others, we refer to this as norm-referenced The number of test items on a domain-

referenced test reflects the intended instruction and curriculum standards, emphasizing that which students should be able to demonstrate as a result of the instruction.

Page 22: Pre- to Post-test Analysis for Local Assessments

What is a posttest cut score? Picture a high jump bar: “how high did we set

the bar?”

A math department would review the test as if they were a minimally competent student

The teachers would then place a (+) next to each correct item and a (–) next to each correct item

This is a Modified Angoff Method

Page 23: Pre- to Post-test Analysis for Local Assessments

during pre-/post-testing......the pretest assesses baseline knowledge

...analysis generates a difference score (D)

... “difference scores”, those that are statistically significant, suggest that the improvement was related to the instruction and not chance factors

Page 24: Pre- to Post-test Analysis for Local Assessments

What are my post-test options? Administer the same test

Scramble the same test: Q1 on pretest is now Q25 on posttest

Equivalent form – the item measures the same thing: pretest Q5: 236 posttest Q5: 635 + 529 + 218

Page 25: Pre- to Post-test Analysis for Local Assessments

More Options... Alternative form test would include a

different test that basically assesses the same content but in an alternative fashion.

An example would be two different reading passages. Test 1 could ask the student to read a

passage and underline the verbs. Test 2 could ask the student to read a

passage and then choose a verb from a multiple choice item.

Page 26: Pre- to Post-test Analysis for Local Assessments

What does a hard pre-test look like?

X X A Floor Effect has Occurred X X X X X X X X X X X X X XX X X X XX X X X X X X This is positively skewed X X X X X X X X X X X____________________________________________________ 0 2% 5% 10% 15% 20% 25% 30%

Page 27: Pre- to Post-test Analysis for Local Assessments

What does an easy pre-test look like?

A Ceiling Effect has Occurred X X

X X

X X

X X X

X X X

This is negatively skewed X X X X

X X X X X

X X X X X X X

X X X X X X X X X X X

____________________________________________________

0 65% 70% 75% 80% 85% 90%

Page 28: Pre- to Post-test Analysis for Local Assessments

What is the likelihood of a bell-shaped curve?

Page 29: Pre- to Post-test Analysis for Local Assessments

What is the likelihood of a bell-shaped curve?

Page 30: Pre- to Post-test Analysis for Local Assessments

“Experimental” Design Not really an experiment (hence the

“quotes”) No random selection or assignment More like a “queasy-experiment” Below are some threats to consider

(Campbell & Stanley, 1963)

Page 31: Pre- to Post-test Analysis for Local Assessments

History Event(s) which may confound with the

treatment effect. Such events affect all students equally. see also intra-session history, where within one

test session/setting (or trial), something happens. Example:

Pre-test Receive instruction Flu bug spreads through class Extra tutoring Post-test

Page 32: Pre- to Post-test Analysis for Local Assessments

History Is the effectiveness of the treatment

measurable given the history (i.e. sickness)? Flu bug Extra tutoring Air conditioner broken during lesson(s)

Page 33: Pre- to Post-test Analysis for Local Assessments

Maturation A process occurring within subjects over

time that may confound with the treatment effect. This process affects all students equally.

Example: Pre-test Receive instruction Naturally learn more over time Post-test

Page 34: Pre- to Post-test Analysis for Local Assessments

Maturation Is the “growth” due to the efficacy of the

reading program (the treatment) or natural growth in reading expected for that group of students? Time elapsed between tests Nature of what is assessed and what is

normally learned outside of school i.e. Natural attention span of 2nd graders in

September vs. January

Page 35: Pre- to Post-test Analysis for Local Assessments

Testing Exposure to the pre-test changes the

potential responses to post-test items, regardless of the “treatment”.

Example: Pre-test Receive instruction Students remember test questions/answers Post-test

Page 36: Pre- to Post-test Analysis for Local Assessments

Testing Is the "growth" between tests due to the

efficacy of the tutoring or the exposure to the pre-test questions? Comparability and sufficient difference

between tests (items) Test/item exposure and memory: do not

exhaust test items Freshman psych classes who take a lot of

tests throughout the year.

Page 37: Pre- to Post-test Analysis for Local Assessments

Instrumentation (decay) Changes in calibration of instruments or scores

from pre to post-tests affect growth between tests.

Example: Pre-test Receive instruction Tests are scored Post-test Tests are scored with inconsistent scrutiny or rubricAnd/Or: Post-test has more/less reliability and/or validity

Page 38: Pre- to Post-test Analysis for Local Assessments

Instrumentation (decay) Is the "growth" due to the treatment or

the reliability of the instrument? Look at reliability estimates Ensure testing conditions are consistent

(i.e. scrap paper or calculators offered both times)

Page 39: Pre- to Post-test Analysis for Local Assessments

Statistical Regression Artificial growth or loss due to

measurement error from pre to post-tests.

Example: Pre-test (highest score) Receive instruction Post-test (lower score)

Page 40: Pre- to Post-test Analysis for Local Assessments

Statistical Regression Are the top students scoring lower due

to variability in test accuracy or because they are forgetting things?

Are the lowest students now scoring higher due to test variability or are they learning more? Ceiling effects and floor effects SEM inflated at extremes of many

assessments

Page 41: Pre- to Post-test Analysis for Local Assessments

Other No measure for how students of different

pre-test scores do on post-test compared to others Not establishing expected gains based on

quantiles of pre-tests and norm data Not taking SEMs into account

Page 42: Pre- to Post-test Analysis for Local Assessments

Time Interval According to L.R. Gay (1992), a 2-week

period between the pre-test and posttest is the minimum time-frame suggested before a test-retest reliability coefficient should be derived (p. 163).

Page 43: Pre- to Post-test Analysis for Local Assessments

How we can help students• Maintain the same test environment during

pretesting/posttesting

• Help eligible students receive Free and Reduced Priced Lunches

• Consider access to food (not only during a test administration)

• Consider students with accommodations IEP e.g. read aloud, resource environment, alternative assessment

Page 44: Pre- to Post-test Analysis for Local Assessments

Advantages....• Results obtained by hand or computer e.g. Excel• It is quantifiable• Class performance can be visually displayed with

vibrant colors• Teachers may use a commercially available test

(ACT EXPLORE/PLAN)• Encourages test blueprinting

Source: Isaac & Michael (1997)

Page 45: Pre- to Post-test Analysis for Local Assessments

Analysis Statistical details Description of Excel file Recommendations

Page 46: Pre- to Post-test Analysis for Local Assessments

Analysis Paired (dependent samples) t-test Not ANOVA

Mathematically equivalent to paired t-test Yet more complex to do and explain with MS Excel

Not ANCOVA Since not randomized and complex to do and

explain with MS Excel Not Wilcoxon signed-rank test

Paired t-test sufficiently robust to most situations and much easier to work with in Excel.

Page 47: Pre- to Post-test Analysis for Local Assessments

Analysis One group pre-test post-test design Using a paired t-test to

analyze gains in light of threats to validity.

The end users of this information are teachers and administrators.

May also be helpful for evaluation a program

Page 48: Pre- to Post-test Analysis for Local Assessments

Difference score (D) layout

Source: Gravetter & Wallnau (2000)

Page 49: Pre- to Post-test Analysis for Local Assessments

Analysis Effect Size (difference expressed in

S.D.s) Excel file refers to Sawilowsky (2009) new

E.S. categories.

Page 50: Pre- to Post-test Analysis for Local Assessments

AnalysisGoal: Catalogue effect sizes resulting from

using the Excel file To provide more accurate category estimates Per grade level and subject area

Per topic/unit/skill Per other factors?

Page 51: Pre- to Post-test Analysis for Local Assessments

Assumptions (Statistical) Data are paired and come from the same

population. Each pair is chosen randomly (not met)

and is independent (met?). The data are measured on an interval

scale (since differences used). Distribution of differences is normal. pre- and post-tests are comparable yet

sufficiently different

Page 52: Pre- to Post-test Analysis for Local Assessments

Why in Excel? Teachers and school administrators need

a better way to analyze pre to post test scores.

Many, if not most, of the world’s problems can be solved via MS Excel.

Many, if not most, of the world’s problems can be worsened via MS Excel.

Page 53: Pre- to Post-test Analysis for Local Assessments

The Excel File For “Squishies”:

simplified summary statement significant/not significant overall percentile improvement

For “Quantoids”: p-value confidence interval for p-value effect size confidence interval for effect size

Page 54: Pre- to Post-test Analysis for Local Assessments

The Excel File

Enter as many pairs of scores as needed.

Page 55: Pre- to Post-test Analysis for Local Assessments

The Excel File

Page 56: Pre- to Post-test Analysis for Local Assessments

The Excel File

Page 57: Pre- to Post-test Analysis for Local Assessments

The Excel File Use with caution

Consider “threats” to validity Not for teacher evaluation! For tests with classical measurement

theory assumptions. May be used for program evaluation

See “threats” May (not) be the best choice of analysis

Contact me with questions, comments, recommendations, or complaints

Page 58: Pre- to Post-test Analysis for Local Assessments

Statistical Significance What does it mean to a teacher

analyzing pre to post-test scores? Helps to examine the likelihood that the

difference between test scores is due to chance.

Quantifiable Adjustable alpha level

Page 59: Pre- to Post-test Analysis for Local Assessments

Effect Size & Percentile Gains What does it mean to a teacher analyzing pre to

post-test scores? Describes the shift in scores (expressed in standard

deviations) Can/is also be expressed in (transformed to)

percentile gains Rules of thumb may not be helpful

Motivation to gather effect sizes per units to inform Research Users of the Excel file

Adjustable confidence interval Lower should not be < or = 0.

Page 60: Pre- to Post-test Analysis for Local Assessments

Recommendations Always keep threats to validity in mind Make the process a habit Save each instance of using the Excel file for

reference later and to see how results change After tweaking test items With different groups of students When other factors change

Districts should be mindful of the standards and the correct use of a test for either high or low stakes decisions.

Page 61: Pre- to Post-test Analysis for Local Assessments

Food for Thought

Not everything that can be counted counts, and not everything that counts can be counted.

Albert Einstein

Page 62: Pre- to Post-test Analysis for Local Assessments

References American Educational Research Association, American Psychological Association, & National

Council on Measurement in Education. (1999). Standards for educational and psychological testing. American Psychological Association.

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.

Campbell, D. T., Stanley, J. C., & Gage, N. L. (1963). Experimental and Quasi-experimental Designs for Research: By Donald T. Campbell and Julian C. Stanley. R. McNally.

Gay, L. R., Mills, G. E., & Airasian, P. W. (2006). Educational research: Competencies for analysis and applications.

Gravetter, F. J. Wallnau.(2000). Statistics for the Behaviour Sciences (5 th ed.). Belmont, CA.: Wadsworth/Thomson.

Isaac, S. & Michael W. B. (1997). Handbook in research and evaluation: A collection of principles, methods, and strategies useful in the planning, design, and evaluation of studies in education and the behavioral sciences (3rd edition). San Diego, CA: Edits Pub.

Nunnally, Jum, C. (1978). Psychometric Theory. Second Edition. McGraw-Hill Book Company. Sawilowsky, S. S. (2009) New effect size rules of thumb. Journal of Modern Applied Statistical

Methods, 8(2) 597-599. Thompson, B. (2003). Score reliability: contemporary thinking on reliability issues. Sage

Publications, Inc.