pearson copyright 2010 what states need to consider in transitioning to computer-based assessments...
TRANSCRIPT
Pearson Copyright 2010
What States Need to Consider in Transitioning to Computer-Based
Assessments from the Viewpoint of a Contractor
Denny WayRob Kirkpatrick
PearsonPresented at the MARCES Conference
College Park, MDOctober 18, 2010
2Pearson Copyright 2010
Overview of the Presentation
• Introductory comments– Contractor’s viewpoint– Background of current testing reforms
• General considerations related to online testing • Some comments about interoperability in the
context of online testing
3Pearson Copyright 2010
What Can an Online Testing Contractor Provide?
• Partnership• Innovation• Flexibility• Support• Responsibility
4Pearson Copyright 2010
Three Questions Driving Education Reform Agenda
1. How do we prepare our students locally to compete and succeed globally?
2. How do we effectively deploy technology and digital innovations in public education—K-12 and higher education—to scale up reforms more quickly and more cost-effectively?
3. How do we deploy technology to make assessments faster, better and cheaper?
5Pearson Copyright 2010
Common Themes Across Common Core Assessment Consortia
• Leveraging advances in technology for greater efficiency, flexibility, and potential cost savings– Online testing– Innovative items– Automated scoring
• Movement away from single summative score and towards concept of through-course assessment
• Use of assessments for multiple purposes • Need for extensive validity research• Interoperability/open standards and open source for
flexibility and less reliance on a single vendor
6Pearson Copyright 2010
What Will States Do in the Meantime?
• State testing programs are continuing and many are implementing online testing
• Benefits are increasingly apparent and achievable– Faster turnaround of scores– More efficient models for test delivery– Opportunity for new and innovative item types – Financial benefits– Opportunities for more accessible assessments
7Pearson Copyright 2010
A Flowchart of Online Testing Program Considerations
8Pearson Copyright 2010
Transitional Strategies
• Full transition to 100% online testing has many benefits
• However, full transition is difficult to achieve– Infrastructure limitations– Resistance to change
• Some strategies to optimize transitions– Move a subject at a time– Paper form as accommodation or “opt out” exception
9Pearson Copyright 2010
Measurement Issues
• Item Types and Tasks• Models for Delivery• Comparability• Scoring models (e.g., automated vs. human
scoring)
10Pearson Copyright 2010
Item Types and Tasks
• Traditional formats• New formats
– Measure same skills in new ways– Measure new skills
• Syncing content & construct validity, measurement models, delivery & scoring systems
• Selecting the right item type to provide the maximum return on investment
• Need for industry-wide taxonomy
11Pearson Copyright 2010
Models for Test Delivery
• Computerized Adaptive Testing (CAT) works best with discrete, objectively-scored questions
• Algorithms must control content, item exposure, items that cannot appear together (enemies) and insert field-test items in a systematic fashion
• When items are associated with passages or stimuli, CAT becomes complicated
• Other computer-based testing models besides CAT exist and deserve consideration
12Pearson Copyright 2010
Detailed Content Control in CAT is Paramount for Standards-Based Testing
Constraint Weight Lower UpperNR.1 3.0 11 13NR.1.1 1.0 2 4NR.1.2 1.0 2 4NR.1.3 1.0 1 3NR.1.4 1.0 1 3
Constraint Weight Lower Upper NR 1.5 1.0 1 3Numeric Reasoning 6.0 18 22 NR.2 3.0 7 9Algebraic Reasoning 6.0 7 9 NR.2.1 1.0 1 3Geometric Reasoning 6.0 5 7 NR.2.2 1.0 1 3Quantitative Reasoning 6.0 5 7 NR.2.3 1.0 1 3
NR.2.4 1.0 1 3NR.2.5 1.0 1 3NR.2.6 1.0 1 3
Number of Items
Number of Items
13Pearson Copyright 2010
Online versus Paper Comparability
• Comparability issues are likely to become more complicated as CBT becomes more widespread:– General focus has been on comparability of a new
online version to an existing paper version, many new programs are designed from the opposite frame of reference
– The unit of comparability is unclear: construct, item, form, performance scoring, scale, norms, platform, …
– Advice given to state’s in technical advisory committees is varied
14Pearson Copyright 2010
Online versus Paper Comparability
• Can mode comparability change over time for the same test?
• Need to develop reasonable, generally accepted, methods to address comparability issues without expensive studies.
• Difficult questions:– Can we use performance scoring decisions derived from paper
administration on computer administered tests?– Does the online version need to look exactly like the paper
version (or vice versa)?– Others
15Pearson Copyright 2010
Automated Scoring Models
• Validity of various automated scoring models• Design issues: field testing and calibration• Design issues: item design, content areas• Public confidence in results
16Pearson Copyright 2010
Operational Issues
• Infrastructure• Readiness• Training• Support
17Pearson Copyright 2010
The Texas STaR Chart: Infrastructure
0%10%20%30%40%50%60%70%80%90%
100%
2005 2006 2007 2008 2009 2010
TARGET TECH - On-demand access for every student, direct connectivity available in all rooms and web-based resources inmultiple rooms. All rooms are connected to WAN. They are fully equipped with appropriate technology.
ADVANCED TECH - 4 or less students per computer. Direct connectivity to Internet in 75% of classrooms and library. Web-based learning is available. All rooms are on LAN/WAN. One educator per computer, shared use of other resources.
DEVELOPING TECH - 5-9 students per computer. Direct connectivity to the Internet in 50% of classrooms and library. Mostrooms connected to WAN/LAN. One educator per computer, shared use of other resources.
EARLY TECH - 10 or more students per computer.Dial-up connectivity. No web-based learning. Shared use of technologyresources.
18Pearson Copyright 2010
Infrastructure
• Our experience is that even newcomers with strong infrastructures need a lot of technical support and preparation to be successful.
• Systems updates by operating systems and other software can be problematic. These should be accomplished outside the testing window if at all possible.
• Long testing windows
19Pearson Copyright 2010
Readiness
• Are schools ready?• Are there enough computers and lab time to
complete testing?• How much staff training and student tutorials are
needed?
20Pearson Copyright 2010
Training
• Decades of experience have led to a deep understanding about what to do in paper-based testing. Folks just seem to naturally do things without many problems. When problems arise, there are well accepted solutions for fixing them.
• When things don’t go well online, it can be very frustrating and challenging for folks administering the test. Expectations may not be well aligned with the state of the art.
21Pearson Copyright 2010
Training
• The focus of training efforts for online testing is typically on familiarizing students with the online experience. Clearly this is a good focus.
• Need to remember the training needs of everyone that touches the assessment system.
22Pearson Copyright 2010
Support
• Importance of strong support leading up to the launch of the test admin window
• Rapid support for real-time issues– The right support at the right time
• Local networks of support
23Pearson Copyright 2010
Interoperability and Online Assessments
• The Common Core assessment consortia have identified interoperability as an essential component in creating next-generation assessments
• Interoperability permits items to be created, delivered and scored in a variety of systems suited to the varied assessment venues
• States should care about interoperability standards when they engage with contractors around online testing
24Pearson Copyright 2010
Current Standards• Question & Test Interoperability (QTI) – part of IMS
– Most complete of all standards today– Developed primarily to support the licensure and certification
markets– Focuses primarily on online testing
• Schools Interoperability Framework (SIF) – Focuses on reporting of assessment results– Contains enough test and item data to facilitate reporting
• Post-secondary Electronic Standards Council (PESC)– Contains assessment data as part of a transcript record (high
school and college transcripts)
• Shareable Content Object Reference Model (SCORM)– Provides for methods to ask for assessment objects to be presented
and receive results from an assessment object
25Pearson Copyright 2010
Limitations of Current Standards• None of the current standards sufficiently…
– Address all required item types (open ended, multi-step, innovative, etc.)
– Address both paper and online delivery modes (or “device” delivery, such as iPhone)
– Separate content and meta-data from presentation information (which allows for one representation of item content to be presented in multiple formats – i.e. paper and online)
– Address testing accommodations– Provide for the complete content development life cycle
(i.e. planning/blueprints, development/review, form building, etc.)
– Provide for the complete assessment life cycle capabilities (i.e., from development through scoring)
26Pearson Copyright 2010
Addressing Interoperability Standards for Next-Generation Assessments• Develop a schema to allow consumption of even
complex assessment items by a number of vendors’ systems, regardless of the platform
• Let the item XML describe the content and interactivity that, with appropriate system support, could be rendered on various devices regardless of programming language or platform
• Separate item content from presentation
27Pearson Copyright 2010
Example: Graphing Polygons
<coordinateGraph vLines=20 hLines=11>
<xAxis startPoint=”0” increment=”1”>
<yAxis startPoint=”0” increment=”1”>
<graphType=”polygonGraphing”>
<pointRules maxPointPerShape=”4” maxShapes=”1”>
<correctAnswer scoringLogic=”exactPoints”>
<pointPos x=”2” y=”-2” label=”P”/>
<pointPos x=”6” y=”-4” label=”Q”/>
<pointPos x=”6” y=”-7” label=”R”/>
<pointPos x=”2” y=”-10” label=”S”/>
28Pearson Copyright 2010
Example: Voice Recording
<audioRecording maxTime=”60” maxAttempts=”5”>
<controls record=”true” stop=”true” play=”true” delete=”true” progressBar=”false” audioLevels=”false”>
29Pearson Copyright 2010
Example: Virtual Labs and Simulation
<state id = “initial”><outcome studentSelection=”hotPlate”><animation>heat_state1</animation><displayText>Mixture heated. No effect.</displayText><endState>initial</endState></outcome></state>
<state id = “initial”><outcome studentSelection=”magnet”><animation>magnet_state1</animation><displayText>Material sticks to magnet. Substance 1 successfully separated. </displayText><success>Substance 1</endState><endState>salt_sand</endState></outcome></state>
30Pearson Copyright 2010
Recommended Next Steps
• Engage existing standards boards• Leverage the cooperation and support of
oversight organizations– CCSSO– Association of Test Publishers (ATP)
• Allow stakeholders to determine the best interoperability solutions