a taxonomy of adaptive testing

29
TSLL 07 Slide 1 September 22, 2007 A Taxonomy of Adaptive Testing Robert J. Mislevy Measurement, Statistics & Evaluation University of Maryland in collaboration with Roy Levy John T. Behrens Arizona State University Cisco Systems, Inc. Presented at the Fifth Annual Technology for Second Language Learning Conference, September 21-22, 2007, Iowa State University, Ames, Iowa, USA

Upload: salvador-morin

Post on 30-Dec-2015

35 views

Category:

Documents


7 download

DESCRIPTION

A Taxonomy of Adaptive Testing. Robert J. Mislevy Measurement, Statistics & Evaluation University of Maryland in collaboration with. Presented at the Fifth Annual Technology for Second Language Learning Conference, September 21-22, 2007, Iowa State University, Ames, Iowa, USA. - PowerPoint PPT Presentation

TRANSCRIPT

TSLL 07 Slide 1September 22, 2007

A Taxonomy of Adaptive Testing

Robert J. MislevyMeasurement, Statistics & Evaluation

University of Maryland

in collaboration with

Roy Levy John T. Behrens

Arizona State University Cisco Systems, Inc.

Presented at the Fifth Annual Technology for Second Language Learning Conference,

September 21-22, 2007, Iowa State University, Ames, Iowa, USA

TSLL 07 Slide 2September 22, 2007

Terminology & Concepts for Adaptive Testing

Adaptive testing» Most familiar as item response-theory based

computer-adaptive testing (IRT-CAT)

Can take a broader perspective of evidentiary reasoning

We will look at the interplay among inferences and data gathering

A taxonomy of configurations» IRT-CAT plus many others

TSLL 07 Slide 3September 22, 2007

Taxonomy based on three dimensions …

Claim status Observation status Locus of control

TSLL 07 Slide 4September 22, 2007

Background for the dimensions

Glenn Shafer’s “Frame of discernment” Evidence–centered assessment design

TSLL 07 Slide 5September 22, 2007

“Frame of discernment”

From Shafer’s (1976) A mathematical theory of evidence. It’s all the possible combinations of values of the variables

your are working with. “Frame” emphasizes how it effectively circumscribes a

universe in which inference will take place “Discern” = “detect, recognize, distinguish” Property of you as much as property of world Depends on what you know and what your purpose is

TSLL 07 Slide 6September 22, 2007

“Frame of discernment”

Frames of discernment can evolve over time, as beliefs, knowledge, and aims unfold over time. E.g., dip for the party? medical diagnosis

Move from one frame of discernment to another by ascertaining values of some variables, dropping others, adding new variables or refining current ones constructing a different frame when observations cause

rethinking of assumptions or goals

TSLL 07 Slide 7September 22, 2007

Evidence-Centered Design

Mislevy, Steinberg, & Almond (2003) “On the structure of educational assessments.”

Educational assessment as evidentiary argument:

We reason from the things students say, do, or make in a handful of particular settings, to what they know, can do in various situations, or have accomplished, as more broadly construed.

All elements of an assessment, from analysis of domain, through design, to operation, are based on building then embodying such an argument in operational procedures.

TSLL 07 Slide 8September 22, 2007

Toulmin’s Argument

Claim

Backing

unless

sinceWarrant

Alternativeexplanationso

Data

Structure

TSLL 07 Slide 9September 22, 2007

An Assessment Design Argument

so

Data concerning

situation

Student acting inassessment situation

Claim about student in some frame of

discernent

Data concerning

performance

Warrant

Backing

Information pertinent to addressing the claims is accumulated in terms of student-model variables

(SMVs)

Information pertinent to addressing the claims is accumulated in terms of student-model variables

(SMVs)

Aspects of performance that bear

on claims is captured in terms of observable

variables (OVs)

Aspects of performance that bear

on claims is captured in terms of observable

variables (OVs)

What we actually see/hear the student

say, do, or make

What we actually see/hear the student

say, do, or make

What aspects of the situation are important for the possibility of

inference about examinee?

What aspects of the situation are important for the possibility of

inference about examinee?

Formative assessments often have highly specific claims, summative assessments tend

to have broader claims.

Formative assessments often have highly specific claims, summative assessments tend

to have broader claims.

TSLL 07 Slide 10September 22, 2007

Adaptive Testing

so

Data concerning

situation

Student acting inassessment situation

Claim about student in some frame of

discernent

Data concerning

performance

Warrant

Backing

1. Somebody selects

situation for getting

information

2. Examinee acts

3. Evaluation of performance in light of current targeted claim

4. Update belief about claim

5. Somebody has choice about whether to

refocus claim

TSLL 07 Slide 11September 22, 2007

What is an adaptive test?

At a given time in an assessment system,

The set of student-model variables and observable variables consitutes a frame of discernment.

An adaptive test is one in which the frame of discernment changes over time as a function of the values of observations.

Ways it might change are the basis of the taxonomy.

TSLL 07 Slide 12September 22, 2007

Claim Status

Is the claim part of the frame of discernment, i.e., SMVs, fixed or evolving?

i.e., do the SMVs at issue stay the same or change (as opposed to knowledge about SMVs)?

TSLL 07 Slide 13September 22, 2007

Observation status

Is the data part of the frame of discernment, i.e., OVs, fixed or evolving?

i.e., does the choice of OVs that can be made stay the same or change as more information is obtained?

TSLL 07 Slide 14September 22, 2007

Locus of Control

If the claim part of the frame is changing as the test procedes, who decides how it should change:

The examiner or the examinee?

If the data part of the frame is changing as the test procedes, who decides how it should change:

The examiner or the examinee?

Claim status

Observation status

FixedAdaptive: Examiner

DeterminedAdaptive: Examinee

Determined

Fixed 1. Usual, linear test 2. IRT-CAT

Adaptive: Examiner Determined

Adaptive: Examinee Determined

“User friendly”testing

Claim status

Observation status

FixedAdaptive: Examiner

DeterminedAdaptive: Examinee

Determined

Fixed 1. Usual, linear test 2. IRT-CAT

Adaptive: Examiner Determined

Adaptive: Examinee Determined

Guided /diagnostic

Claim status

Observation status

FixedAdaptive: Examiner

DeterminedAdaptive: Examinee

Determined

Fixed 1. Usual, linear test 2. IRT-CAT

Adaptive: Examiner Determined

Adaptive: Examinee Determined

Self-guided /diagnostic

TSLL 07 Slide 18September 22, 2007

Cell 1: Fixed, examiner-controlled claim; Fixed, examiner-controlled observation

Traditional assessments in which … Same kind of claim(s) / inferences / SMVs for everyone they were decided on by the examiner a priori, tasks presented are determined by the examiner a priori, the examiner determines the sequence of tasks a priori

Neither the frame of discernment nor the gathering of evidence varies in response to values of observable variables or their impact on beliefs about SMVs.

TSLL 07 Slide 19September 22, 2007

Cell 2: Fixed, examiner-controlled claim; Adaptive, examiner-controlled observation

Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner, the tasks presented are determined by examiner a priori,

But in light of unfolding pattern responses, examiner selects items, to maximize accuracy

IRT-CAT (Can be multivariate; Segall, 1996). Binet’s original individually-administered intelligence test Lord’s Flexi-level scheme

TSLL 07 Slide 20September 22, 2007

Cell 3: Fixed, examiner-controlled claim; Adaptive, examinee-controlled observation

Same claims space (SMVs) for everyone the claims (SMVs) were decided on by the examiner.

But examinee is able to determine tasks in light of how he/she chooses. “User friendly”

Pole-vaulting competition Self-adaptive SAT (Wise et al, 1992): Student chooses

items by page or bin, grouped by difficulty. IRT scoring takes difficulty into account. (also see Wright, 1977)

Guard against nonignorable missingness (free throws)

TSLL 07 Slide 21September 22, 2007

Cell 4: Adaptive, examiner-controlled claim; fixed, examiner-controlled observation

Same tasks (OVs) for everyone Same presentation of tasks, determined a priori by

examiner.

But examiner determines claims (SMVs) for examinee in light of responses. E.g.,

MMPI – same 100’s of items for everyone, but examiner may compute different scales for different patients.

Diagnostic “reading record” test in language testing

Note: Need multidimensional claim space in Cells 4-9.

TSLL 07 Slide 22September 22, 2007

Cell 5: Adaptive, examiner-controlled claim;

adaptive, examiner-controlled observation Claims may diverge for different examinees in light of data Different tasks for different examinees, to be optimal in light

of the claims examiner wants to make about them as individuals

E.g., Triage in medicine, followed by different diagnostics Adaptive MMPI – different items for everyone, adaptively

selected for different scales for scales for different patients. Differential strategies in math (Tatsuoka) Adaptive diagnosis in language testing

TSLL 07 Slide 23September 22, 2007

Cell 6: Adaptive, examiner-controlled claim;

adaptive, examinee-controlled observation Examiners can home in on different claims for different

examinees in light of data, but Examinees have at least some control over task selection.

E.g., Self-adaptive tests, but along dimensions controlled by

examiner. Mulivariate SA-SAT, examiner’s inferences. Diagnostic / placement tests, homing in on different

remedial needs of students, but allowing for lower-stress choices of groups/pages of tasks like in Cell 3.

Thus examiner tailors claims part of frame of discernment, examinee tailors overvations part given claims.

TSLL 07 Slide 24September 22, 2007

Cell 7: Adaptive, examinee-controlled claims; fixed,

examiner-controlled observations Examinees all take same examiner-determined items in

examiner-determined way, but … Examinees can home in on different claims of their

choosing in light of data.

E.g., MMPI, but examinee determines which scales to compute

& analyze. Oral reading of a fixed sample, automated parsing—

student determines what to work on next (maybe could be done with Ordinate-like setup?)

TSLL 07 Slide 25September 22, 2007

Cell 8: Adaptive, examinee-controlled claims;

adaptive, examiner-controlled observations Examinee chooses the claim, at beginning or adaptively, examiner controls tasks presentation for optimal precision.

E.g., structured self-diagnosis: MMPI, where examinee determines which scales to focus

on and is presented items adaptively for those scales. Oral readings w. automated parsing—student determines

what to work on next, then examiner-selected samples to focus on what examinee wants to follow up on.

SIGI: Sequential exploration of career interests -- examinee chooses categories and system asks adaptive questions.

TSLL 07 Slide 26September 22, 2007

Cell 9: Adaptive, examinee-controlled claims;

adaptive, examinee-controlled observations Examinees control both the claims and the tasks to yield

observations for those claims. The examinee selects the claims to focus on and then has

input into what data will be observed. Feedback from system to help examinee figure out what

they want to know, then offer them choices about directions to go to refine information they receive

(continued)

TSLL 07 Slide 27September 22, 2007

Cell 9, continued: Adaptive, examinee-controlled

claims; adaptive, examinee-controlled observations

E.g., guided self-diagnosis: Central challenge in retrieval systems in libraries --

organize materials and search terms to help patrons find the information they might want

Amazon: “Customers who looked at these books you selected also looked at…”

Multivariate SA-SAT practice exploration space Language testing self-diagnosis: Start with common

passage or list of areas, do diagnostics, use results to refine testing for areas you are interested in.

Claim status

Observation status

FixedAdaptive: Examiner

DeterminedAdaptive: Examinee

Determined

Fixed 1. Usual, linear test 2. IRT-CAT3. Self-adapting tests e.g., SA-SAT (Wise et al., 1992)

Adaptive: Examiner Determined

4. MMPI—examiner decides how to pursue analysis

5. Examiner chooses target, Multidim CAT

6. Examiner chooses target in Multidim SA-SAT

Adaptive: Examinee Determined

7. MMPI—examinee decides how to pursue analysis

8. Examinee chooses target, Multidim CAT

9. Examinee chooses target & tasks in Multidim SA-SAT

TSLL 07 Slide 29September 22, 2007

Conclusion

Assessments involving adaptive claims have yet to achieve the prominence of adaptive-observation assessments. » History, up-front work, solving known “centralized” problems

User-controlled assessment not seen as assessment User modeling literature will be important Cells 8 & 9 good for self-directed learning in a

supported environment» Like user-modeling strategies for buying cars, choosing

movies, finding information in library systems.