pearson copyright 2010 what states need to consider in transitioning to computer-based assessments...

Pearson Copyright 2010

What States Need to Consider in Transitioning to Computer-Based

Assessments from the Viewpoint of a Contractor

Denny WayRob Kirkpatrick

PearsonPresented at the MARCES Conference

College Park, MDOctober 18, 2010

2Pearson Copyright 2010

Overview of the Presentation

• Introductory comments– Contractor’s viewpoint– Background of current testing reforms

• General considerations related to online testing • Some comments about interoperability in the

context of online testing


What Can an Online Testing Contractor Provide?

• Partnership• Innovation• Flexibility• Support• Responsibility


Three Questions Driving Education Reform Agenda

1. How do we prepare our students locally to compete and succeed globally?

2. How do we effectively deploy technology and digital innovations in public education—K-12 and higher education—to scale up reforms more quickly and more cost-effectively?

3. How do we deploy technology to make assessments faster, better and cheaper?


Common Themes Across Common Core Assessment Consortia

• Leveraging advances in technology for greater efficiency, flexibility, and potential cost savings– Online testing– Innovative items– Automated scoring

• Movement away from single summative score and towards concept of through-course assessment

• Use of assessments for multiple purposes • Need for extensive validity research• Interoperability/open standards and open source for

flexibility and less reliance on a single vendor


What Will States Do in the Meantime?

• State testing programs are continuing and many are implementing online testing

• Benefits are increasingly apparent and achievable– Faster turnaround of scores– More efficient models for test delivery– Opportunity for new and innovative item types – Financial benefits– Opportunities for more accessible assessments


A Flowchart of Online Testing Program Considerations


Transitional Strategies

• Full transition to 100% online testing has many benefits

• However, full transition is difficult to achieve– Infrastructure limitations– Resistance to change

• Some strategies to optimize transitions– Move a subject at a time– Paper form as accommodation or “opt out” exception


Measurement Issues

• Item Types and Tasks• Models for Delivery• Comparability• Scoring models (e.g., automated vs. human

scoring)


Item Types and Tasks

• Traditional formats• New formats

– Measure same skills in new ways– Measure new skills

• Syncing content & construct validity, measurement models, delivery & scoring systems

• Selecting the right item type to provide the maximum return on investment

• Need for industry-wide taxonomy


Models for Test Delivery

• Computerized Adaptive Testing (CAT) works best with discrete, objectively-scored questions

• Algorithms must control content, item exposure, items that cannot appear together (enemies) and insert field-test items in a systematic fashion

• When items are associated with passages or stimuli, CAT becomes complicated

• Other computer-based testing models besides CAT exist and deserve consideration


Detailed Content Control in CAT is Paramount for Standards-Based Testing

Constraint Weight Lower UpperNR.1 3.0 11 13NR.1.1 1.0 2 4NR.1.2 1.0 2 4NR.1.3 1.0 1 3NR.1.4 1.0 1 3

Constraint Weight Lower Upper NR 1.5 1.0 1 3Numeric Reasoning 6.0 18 22 NR.2 3.0 7 9Algebraic Reasoning 6.0 7 9 NR.2.1 1.0 1 3Geometric Reasoning 6.0 5 7 NR.2.2 1.0 1 3Quantitative Reasoning 6.0 5 7 NR.2.3 1.0 1 3

NR.2.4 1.0 1 3NR.2.5 1.0 1 3NR.2.6 1.0 1 3

Number of Items

Number of Items


Online versus Paper Comparability

• Comparability issues are likely to become more complicated as CBT becomes more widespread:– General focus has been on comparability of a new

online version to an existing paper version, many new programs are designed from the opposite frame of reference

– The unit of comparability is unclear: construct, item, form, performance scoring, scale, norms, platform, …

– Advice given to state’s in technical advisory committees is varied


Online versus Paper Comparability

• Can mode comparability change over time for the same test?

• Need to develop reasonable, generally accepted, methods to address comparability issues without expensive studies.

• Difficult questions:– Can we use performance scoring decisions derived from paper

administration on computer administered tests?– Does the online version need to look exactly like the paper

version (or vice versa)?– Others


Automated Scoring Models

• Validity of various automated scoring models• Design issues: field testing and calibration• Design issues: item design, content areas• Public confidence in results


Operational Issues

• Infrastructure• Readiness• Training• Support


The Texas STaR Chart: Infrastructure

0%10%20%30%40%50%60%70%80%90%

100%

2005 2006 2007 2008 2009 2010

TARGET TECH - On-demand access for every student, direct connectivity available in all rooms and web-based resources inmultiple rooms. All rooms are connected to WAN. They are fully equipped with appropriate technology.

ADVANCED TECH - 4 or less students per computer. Direct connectivity to Internet in 75% of classrooms and library. Web-based learning is available. All rooms are on LAN/WAN. One educator per computer, shared use of other resources.

DEVELOPING TECH - 5-9 students per computer. Direct connectivity to the Internet in 50% of classrooms and library. Mostrooms connected to WAN/LAN. One educator per computer, shared use of other resources.

EARLY TECH - 10 or more students per computer.Dial-up connectivity. No web-based learning. Shared use of technologyresources.


Infrastructure

• Our experience is that even newcomers with strong infrastructures need a lot of technical support and preparation to be successful.

• Systems updates by operating systems and other software can be problematic. These should be accomplished outside the testing window if at all possible.

• Long testing windows


Readiness

• Are schools ready?• Are there enough computers and lab time to

complete testing?• How much staff training and student tutorials are

needed?


Training

• Decades of experience have led to a deep understanding about what to do in paper-based testing. Folks just seem to naturally do things without many problems. When problems arise, there are well accepted solutions for fixing them.

• When things don’t go well online, it can be very frustrating and challenging for folks administering the test. Expectations may not be well aligned with the state of the art.


Training

• The focus of training efforts for online testing is typically on familiarizing students with the online experience. Clearly this is a good focus.

• Need to remember the training needs of everyone that touches the assessment system.


Support

• Importance of strong support leading up to the launch of the test admin window

• Rapid support for real-time issues– The right support at the right time

• Local networks of support


Interoperability and Online Assessments

• The Common Core assessment consortia have identified interoperability as an essential component in creating next-generation assessments

• Interoperability permits items to be created, delivered and scored in a variety of systems suited to the varied assessment venues

• States should care about interoperability standards when they engage with contractors around online testing


Current Standards• Question & Test Interoperability (QTI) – part of IMS

– Most complete of all standards today– Developed primarily to support the licensure and certification

markets– Focuses primarily on online testing

• Schools Interoperability Framework (SIF) – Focuses on reporting of assessment results– Contains enough test and item data to facilitate reporting

• Post-secondary Electronic Standards Council (PESC)– Contains assessment data as part of a transcript record (high

school and college transcripts)

• Shareable Content Object Reference Model (SCORM)– Provides for methods to ask for assessment objects to be presented

and receive results from an assessment object


Limitations of Current Standards• None of the current standards sufficiently…

– Address all required item types (open ended, multi-step, innovative, etc.)

– Address both paper and online delivery modes (or “device” delivery, such as iPhone)

– Separate content and meta-data from presentation information (which allows for one representation of item content to be presented in multiple formats – i.e. paper and online)

– Address testing accommodations– Provide for the complete content development life cycle

(i.e. planning/blueprints, development/review, form building, etc.)

– Provide for the complete assessment life cycle capabilities (i.e., from development through scoring)


Addressing Interoperability Standards for Next-Generation Assessments• Develop a schema to allow consumption of even

complex assessment items by a number of vendors’ systems, regardless of the platform

• Let the item XML describe the content and interactivity that, with appropriate system support, could be rendered on various devices regardless of programming language or platform

• Separate item content from presentation


Example: Graphing Polygons

<coordinateGraph vLines=20 hLines=11>

<xAxis startPoint=”0” increment=”1”>

<yAxis startPoint=”0” increment=”1”>

<graphType=”polygonGraphing”>

<pointRules maxPointPerShape=”4” maxShapes=”1”>

<correctAnswer scoringLogic=”exactPoints”>

<pointPos x=”2” y=”-2” label=”P”/>

<pointPos x=”6” y=”-4” label=”Q”/>

<pointPos x=”6” y=”-7” label=”R”/>

<pointPos x=”2” y=”-10” label=”S”/>


Example: Voice Recording

<audioRecording maxTime=”60” maxAttempts=”5”>

<controls record=”true” stop=”true” play=”true” delete=”true” progressBar=”false” audioLevels=”false”>


Example: Virtual Labs and Simulation

<state id = “initial”><outcome studentSelection=”hotPlate”><animation>heat_state1</animation><displayText>Mixture heated. No effect.</displayText><endState>initial</endState></outcome></state>

<state id = “initial”><outcome studentSelection=”magnet”><animation>magnet_state1</animation><displayText>Material sticks to magnet. Substance 1 successfully separated. </displayText><success>Substance 1</endState><endState>salt_sand</endState></outcome></state>


Recommended Next Steps

• Engage existing standards boards• Leverage the cooperation and support of

oversight organizations– CCSSO– Association of Test Publishers (ATP)

• Allow stakeholders to determine the best interoperability solutions

pearson copyright 2010 what states need to consider in transitioning to computer-based assessments...

Documents

pearson copyright

online testing benefits

online testing contractor

exception slide

human scoring slide

accessible assessments

single vendor slide

state testing programs