technologies in large-scale assessments: new directions...

1

Technologies in Large-Scale Assessments: New Directions, Challenges, and Opportunities

Michal Beller

Director-General RAMA – The National Authority for Measurement and Evaluation in Education

Israel

2

Introduction

Assessment serves a critical role in education. It holds education systems accountable

and, at the same time, serves as a gateway to systemic change in education. Assessment signals

priorities for curriculum and instruction, and in some jurisdictions, teachers tend to model their

pedagogical approach on standardized large-scale assessments. Curriculum developers and

designers of educational materials respond to high-stakes tests by modifying existing textbooks

and other instructional materials and by developing and marketing new ones suited to test

expectations. Ultimately, assessment drives change at all levels, from classroom to legislature,

though its effects range from the positive to the negative (e.g., Koretz and Hamilton, 2006;

Nichols and Berliner, 2007).

Therefore, the contents and the mode of assessment should be designed carefully and

thoughtfully to ensure the impact on education is positive. For this to happen, assessments should

be more aligned with the intended curriculum, embedded in authentic contexts, and fully

integrated into the learning and teaching process. Effective integration of assessment in learning

and teaching is a challenge, particularly due to the need to balance between formative and

summative assessments. Technology can assist in achieving such a challenging goal (e.g.,

Bennett, 2001, 2002).

It is clear that existing innovative technologies, such as smart phones and tablets, as well

as future technologies, hold the potential to dramatically change the way assessment will be

implemented in the future. However, this chapter deals neither with the pure technological

opportunities ahead of us, nor with the technological barriers existing today (e.g., access to

computers in schools and bidirectional adaptation of the test platform to various languages such

as Arabic and Hebrew).

3

This paper describes the recent developments of technology-based national large-scale

assessments (NLSAs) and international large-scale assessments (ILSAs) and addresses whether

the digital revolution of ILSAs will be merely a technological step forward or serve as a catalyst

for a more profound pedagogical change in the way instruction, learning and assessments will be

conducted in the future. We look at whether it will foster the integration of 21st century

competencies and expertise into teaching and instruction of all content areas, and whether it will

facilitate the creation of new methodologies for a better use of technology and assessment in the

service of learning.

The Role of Technology in Assessment

Information and communication technologies (ICT) assist in automating various phases

of testing processes such as designing, presenting, recording, and distributing test materials.

However, one main goal of integrating ICT is to enable assessment of those aspects of cognition

and performance that are complex and dynamic and have been impossible to assess directly via

paper and pencil (P&P). New capabilities afforded by technology include directly assessing

problem-solving skills, exposing the sequences of actions taken by learners in solving problems,

and modeling complex reasoning tasks. Technology also makes possible the collection of data

regarding students' concept organization and other aspects of students' knowledge structures, as

well as representations of their participation in socially interactive discussions and projects

(Chudowsky and Pellegrino, 2003).

By enriching assessment situations through the use of multimedia, interactivity and

control over the stimulus display, it is possible to assess a much wider array of constructs than

was previously possible. For example, SimScientists uses science simulations to support

powerful formative assessments of complex science learning (Quellmalz et al., 2009). The

4

simulation-based assessment tasks are dynamic, engaging and interactive. They are capable of

assessing complex science knowledge and inquiry skills, which go well beyond the capabilities

of printed tests.

Advances in measurement, technology and in the understanding of cognition hold

promise for creating assessments that are more useful and valid indicators of what students know

and can do. Such work involves reconceptualizing assessment design and use and tying

assessment more directly to the processes and contexts of learning and instruction (Chudowsky

and Pellegrino, 2003; Quellmalz and Pellegrino, 2009; Bejar and Graf, 2010). Success in tying

assessment more closely to contexts and specific learning processes is what will make these new

developments more effective.

Universal Design of Assessment

There is a push in several countries (the United States in particular) to expand national

and state testing and to require that assessment systems encompass all students – including those

with disabilities, learning disorders and those with limited proficiency in the language of

instruction – many of whom have not been included in these systems in the past.

Technology allows for the development of assessments using universal design principles

that make assessments more accessible, effective, and valid for students with greater diversity in

terms of disabilities and limited language proficiency (mainly non-native), thus allowing for

wider student inclusion (e.g., Johnstone et al., 2009; Thompson et al., 2002). Technology allows

for presentation and assessment using different representations of the same concept or skill and

can accommodate various student disabilities and strengths. Therefore, presenting information

through multiple modalities enlarges the proportion of the population that can be assessed fairly.

5

Similarly, assistive technology can make it possible for students with disabilities and those who

require special interfaces to interact with digital resources, allowing them to demonstrate what

they know and can do in ways that would be impossible with standard print-based assessments.

Almond et al. (2010) proposed stimulating research into technology-enabled assessments

that incorporate conditions designed to make tests appropriate for the full range of the student

population by enhancing accessibility. The prospect is that rather than having to retrofit existing

assessments to include these students (through the use of large numbers of accommodations or a

variety of alternative assessments), new assessments can be designed and developed from the

outset to allow participation of the widest possible range of students, in a way that results in valid

inferences about performance for all students who participate in the assessment.

The Role of Technology in National Large-Scale Assessments

Efforts are being made for system wide implementation of technology-based assessments

in order to extend, improve or replace existing assessment systems, or to create entirely new

assessment systems. Kozma (2009) provided a comprehensive list of the potential advantages

and challenges of incorporating ICT into large-scale assessments (See exact quote in Appendix

1).

Delivery of national assessments via computer is becoming more prevalent as changes

are made in assessment methodologies that reflect practical changes in pedagogical methods.

Introducing computer-based testing (CBT) is often viewed as a necessary and positive change in

large-scale educational assessment – not only because it reflects changes in classroom

instruction, which is becoming more and more computer-mediated, but also because it can

provide a number of assessment and administrative advantages (Bennett, 2010).

6

A systemic integration of digital components in NLSA began in the United States with

the National Assessment of Educational Progress (NAEP) (e.g., Sandene et al., 2005; Bennett et

al., 2007), and is being implemented today in a number of NLSA systems around the globe as

part of a process of integrating technology into learning and instruction (e.g., Iceland, Denmark,

Hungary, Israel and Australia). However, results from the NAEP Problem Solving in

Technology-Rich Environment study (Bennett et al., 2007, 2010) made it clear that going beyond

traditional testing is still extremely challenging. Recent progress on SimScientist is encouraging

– findings from an evaluation study based on six states (Herman, Dai, Htut , Martinez & Rivera,

2011) provides recommendations for incorporating these simulation-based assessments into state

science assessment systems in addition to the formative classroom use (Quellmallz, Silberglitt &

Timms, 2011) .

An elaboration on the main relevant developments in Israel and the United States follow

below.

Israel

Instruction, Learning and Assessment via Computers

The Israeli education system went through several phases of integrating technologies into

instruction, learning and assessment (not yet with complete school coverage):

I. 1975-1990: The Infrastructure Phase I – Setting up computer labs in schools,

mainly for drill and practice.

II. 1999-1990 : The Infrastructure Phase II – Personal computer labs featuring

multimedia programs.

7

III. 1999-2005: The Internet Phase – Proliferation of school websites. Sporadic web-

based activities aimed at information literacy and project-based learning.

IV. 2006–present: A shift to web-based content and tools (e.g., CET, Time-to-Know,

Galim) aligned with the national curricula (links below).

CET develops both digital content and advanced technology for instruction, learning and

assessment (ILA) by various means. The ILA system provides (in both Hebrew and Arabic):

• Optimal integration of the latest technologies available

• Integrated learning units with assessment tasks

• Classroom–home continuum

• Use of a learning management system for managing learning and assessment

• Customization for students with learning disabilities

The ILA system takes full advantage of computer capabilities for instruction, learning

and assessment, such as simulations and animations; virtual labs; video clips; maps; narration;

information literacy tasks on the web, and interactive feedback for the student. For example, the

online elementary school system – Ofek (horizon, in Hebrew), a bilingual (Hebrew and Arabic)

teaching, learning and assessment environment, contains a learning management system and a

large collection of curriculum-based learning activities and assessment assignments in most

subjects (Hebrew, math, science, social studies, Bible, current events, etc.). Ofek also offers

innovative generators, such as an online comic strip generator, math exercises generator, writing

generator and more, all allowing teachers to create their own teaching material in line with the

specific needs of their students.

8

For more on CET, see: http://cet.org.il/pages/Home.aspx and http://ofek.cet.ac.il

For more on Time-To-Know, see: http://www.timetoknow.com/

For more on Galim, see: http://www.galim.org.il/fields/english.html

The United States

The U.S. education system went through phases similar to the Israeli ones with regard to

integrating technologies. However, the U.S. system was one of the first in the world, if not the

first, to conduct a large scale online standardized assessment via its National Assessment of

Educational Progress (NAEP).

The National Assessment of Educational Progress (NAEP)

NAEP conducted three field investigations as part of the Technology-Based Assessment

Project, which explored the use of new technology in administering NAEP.

The first field investigation was the Math Online (MOL) study. It addressed issues related

to measurement, equity, efficiency, and operations in online mathematics assessment. In the

MOL study, data were collected from more than 100 schools at each of two grade levels in

spring 2001 (Sandene et al., 2005).

The 2002 Writing Online (WOL) study was the second of three field investigations in the

Technology-Based Assessment project, and it explored the use of new technology in

administering NAEP.

The 2003 Problem Solving in Technology-Rich Environments study (TRE) was the third

study and it investigated how computers might be used to measure skills that cannot be measured

on a paper and pencil test (Bennett et al., 2007, 2010).

9

In continuation of the previous studies, NAEP recently completed a multistate field test

of online writing assessment for eighth and 12th grades to be operational in 2011. The design of

the 2011 NAEP writing assessment reflects the way today’s students compose written texts − and

are expected to compose texts − particularly as they move into postsecondary settings. The

assessment is designed to measure the ability of students in grades 8 and 12 to write using word

processing software with commonly available tools.

For more, see: http://nces.ed.gov/nationsreportcard/writing/cba.asp

U.S. State Assessments

West Virginia and Oregon are among the leading states in moving forward on the

integration of technology into their state assessments.

1) West Virginia

West Virginia’s techSteps program is a literacy framework based on the National

Education Technology Standards for Students (NETS*S) and is aligned with state and Common

Core State Standards.

TechSteps is a personalized, project-based technology literacy curriculum that infuses

technology skills into core instruction, promoting core subject area outcomes while also teaching

skills for the 21st century. A Technology Literacy Assessment Profile is built for the student and

updated as each new activity is performed and recorded. This approach allows educators to teach

and assess technology literacy in an integrated and systematic manner, using authentic

performance-based assessment to provide meaningful feedback to students and generate data for

10

reporting purposes. This system provides West Virginia with statewide student data

on technology proficiencies at each grade level.

For more, see: http://www.techsteps.com/public/home/

2) Oregon

The Oregon Assessment of Knowledge and Skills (OAKS), is the name for the larger

Oregon statewide assessment system. This new online testing system assesses students' mastery

of Oregon content standards. The OAKS Online operational test is now available for reading,

mathematics, science, and social sciences.

For more, see: http://www.oaks.k12.or.us/

Recent Developments in the United States Cognitively Based Assessments of, for and as Learning (CBAL) – A Research Project

The improvements in large-scale assessments, important as they may be, will not resolve

the growing tension between internal and external school assessment. An innovative research

project proposed by Educational Testing Service (ETS) is CBAL (Bennett and Gitomer, 2009;

Bennett, 2010), which attempts to resolve this tension.

Technology plays a key role in the CBAL project. The central goal of CBAL is to create

a comprehensive system of assessment that documents what students have achieved ("of

learning"), helps identify how to plan and adjust instruction ("for learning"), and is considered by

students and teachers to be a worthwhile educational experience in and of itself ("as learning").

The computerized system, when completed, will attempt to unify and create synergy among

accountability testing, formative assessment and professional support.

11

Accountability tests, formative assessment and professional support will be derived from

the same conceptual base. This base will rest upon rigorous cognitive research, common core or

state standards, and curricular considerations. CBAL assessments will consist largely of

engaging, extended, constructed-response tasks that are delivered primarily by computer and, as

much as possible, automatically scored. CBAL assessments are designed to help students take an

active role in the assessment of their own learning.

It should be noted, however, that for such an integrated accountability system to be

relevant and effective, the participating educational bodies should be in full agreement as to

curricula and standards.

For more, see: http://www.ets.org/research/topics/cbal/initiative

U.S. Department of Education (DOE)

I'm calling on our nation's governors and state education chiefs to develop

standards and assessments that don't simply measure whether students can fill in

a bubble on a test, but whether they possess 21st century skills like problem

solving and critical thinking and entrepreneurship and creativity.

– President Barack Obama, address to the Hispanic Chamber of Commerce,

March 10, 2009

In an effort to provide ongoing feedback to teachers during the course of the school year,

measure annual student growth, and move beyond narrowly focused bubble tests, the U.S.

Department of Education has awarded grants (in September 2010) to two groups of states to

develop a new generation of tests. The consortia – the Partnership for Assessment of Readiness

for Colleges and Careers (PARCC), and the Smarter Balanced Assessment Consortium (SBAC)

12

– were awarded $170 million and $160 million, respectively, to design assessments that evaluate

students based on common-core standards by the 2014-15 school year. The tests will assess

students' knowledge of mathematics and English language arts from third grade through high

school.

As I travel around the country the number one complaint I hear from teachers is

that state bubble tests pressure teachers to teach to a test that doesn't measure

what really matters. Both of these winning applicants are planning to develop

assessments that will move us far beyond this and measure real student

knowledge and skills.

– U.S. Education Secretary Arne Duncan

(http://www.ed.gov/news/press-releases/us-secretary-education-duncan-announces-

winners-competition-improve-student-asse)

The Smarter coalition will test students using computer adaptive technology that will

tailor questions to students based on their answers to previous questions. Smarter will continue to

use one test at the end of the year for accountability purposes but will create a series of interim

tests used to inform students, parents, and teachers about whether students are on track.

PARCC’s next-generation assessment system will provide students, educators,

policymakers and the public with the tools needed to identify whether students – from grade 3

through high school – are on track for postsecondary success, and, critically, where gaps may

exist and how they can be addressed well before students enter college or the workforce.

With such new technology-based assessments, schools may no longer have to interrupt

their routine instructional processes at various times during the year to administer external tests

13

to students, not to mention the time saved on preparing them for the tests. Such a scenario would

be made possible when schools implement assessments for both formative and summative

purposes in a manner developed now in CBAL.

The cognitively-based assessment of, for, and as learning (CBAL) system has many of

the features that are envisioned in the PARCC and SBAC consortia. The CBAL summative

assessments are not one-time events, but rather are spread over several occasions throughout the

school year. This is a feature that has attracted a good deal of attention in discussions of the

consortia plans. CBAL also emphasizes the use of formative assessments by teachers and in

teacher professional development. This emphasis on formative assessment and teacher

professional development can also be found in the broad outlines of the consortia plans (Linn,

2010).

The Role of Technology in International Large-Scale Assessments (ILSAs)

Among the more familiar LSA systems are the International Large-Scale Assessments

(ILSAs), which are designed to enrich the knowledge and understanding of decision-makers in

education systems in different countries through international comparisons and comparative

studies in central areas of education. The major players in this arena are the International

Association for the Evaluation of Educational Achievement, or IEA, with Trends in

International Mathematics and Science Study (TIMSS), Progress in International Reading

Literacy Study (PIRLS), International Computer and Information Literacy Study (ICILS), and

others), and the Organization for Economic Cooperation and Development (OECD) with the

Program for International Student Assessment (PISA), Program for the International Assessment

of Adult Competencies (PIAAC,) and the Teaching and Learning International Survey

14

(TALIS),.These international assessments usually involve nationally representative samples

composed of thousands of students.

The growing popularity of ILSAs has been accompanied by several phenomena. In many

countries today ILSAs constitute de facto standards for learning goals in the subjects assessed,

and as a result the curricula in different countries have been aligned with the theoretical

framework of these international assessments. In a number of countries the results and rankings

of ILSAs have become politically high stakes: ministers of education and governments perceive

them as being an indicator of the success of their local policy (e.g., Poland, Germany, and Israel).

As such, international assessments intensify the negative consequences that often accompany

high-stakes tests in different countries ("teaching to the test," diverting resources, etc.).

In international assessment research programs, as in national and local assessment

programs, three different themes are evident in the application of ICT. One is the use of ICT to

better assess the domains that have traditionally been the focus of assessment in schools: reading,

mathematics and science. Note that the use of ICT to create richer and more interactive

assessment materials increases validity and enables the assessment of aspects within the domains

that, up to now, have been difficult to assess. A second theme is the use of technology to assess

more generic competencies, such as ICT skills and a broad set of generalizable and transferable

knowledge, skills and understandings that are relevant to managing and communicating

information. A third theme is the use of technology to assess more complex constructs, which are

less well understood and characterize much of the thinking about 21st century skills. Such

constructs include creativity and collaborative problem solving (as typified by the Assessment

and Teaching of 21st Century Skills (ATC21S) Project.

15

In 2006, for the first time and on an experimental basis, PISA included an optional

computer-based component in the assessment of student achievements in science (CBAS). In

PISA 2009, in addition to the P&P assessments, countries were offered the opportunity to

participate in a computerized assessment of electronic reading of texts (ERA). In 2011, 15-year-

old students around the world took part in a pilot study of three computerized assessments

(problem solving, mathematical literacy, and reading of electronic texts) as a pilot study for

PISA 2012. Also, adults (16 to 64 years old) around the world were assessed digitally on

PIAAC. In 2012, eighth graders around the world will participate in a pilot study in preparation

for ICILS 2013. For 2015, PISA plans are to proceed on the assumption that computer delivery

will be a significant aspect of the overall assessment. The full extent of computer delivery has

yet to be established.

A brief elaboration on all these new assessments follows below.

Organization for Economic Cooperation and Development (OECD)

Since 2006, the OECD has gradually begun introducing technology into its assessments

(PISA and PIAAC):

1. The Program for International Student Assessment −−−− PISA

PISA is an internationally standardized assessment that was jointly developed by

participating economies and administered to 15-year-olds in schools. PISA focuses on young

people’s capacity to demonstrate their preparedness in the fundamental domains of reading

literacy, mathematical literacy, and scientific literacy. Four PISA cycles have been carried out so

far (in 2000, 2003, 2006 and 2009). Close to 70 countries and large economies participated in the

2009 cycle.

16

PISA is different from other international surveys, such as TIMSS, which is curriculum-

based, in that it attempts to assess the skills and competencies each learner needs for further

study and success in the future. Although the basic skills assessed in PISA (reading, math,

science) are similar to other assessment programs, their definition is different and broader and

the tasks are put into the context of everyday situations with which most people will have to deal.

Therefore, it is not surprising that the PISA governing board decided to make use of

computer-based assessments, not only to measure ICT literacy skills, but also to allow for the

provision of a wider range of dynamic and interactive tasks and to explore more efficient ways of

carrying out the main tests of student knowledge and skills in reading, mathematics and science.

The initial goal was to begin the digital implementation in 2003. However, it only

materialized in 2006. A list of past, present and future cycles of PISA is presented in Table 1,

with the green cells indicating the extended domain assessed in each cycle, and the blue letters

indicating digital assessment components (added to or replacing paper and pencil).

17

Table 1 – PISA cycles and the digital components

Other Domains

Science Literacy

Mathematics Literacy

Reading Literacy

+++PISA 2000

Problem Solving

+++PISA 2003

+CBAS

++PISA 2006

+++ERA

PISA 2009

Problem SolvingFinancial Literacy

++EM

+ERA

PISA 2012

Collaborative Problem Solving

CBASEMERAPISA 2015

+ = Paper and pencil assessments

CBAS = Computer-Based Assessment of Science

ERA = Electronic Reading Assessment

EM = Electronic Math

Note: Cells in green denote the extended domain assessed in a given year

Overall, there is a challenge in how to move forward to CBT while keeping the paper and

pencil trends of the scores across the years. In a way, the two goals of preserving trends and

moving to CBT are incompatible. Changing the nature of tests generally breaks the trends, while

keeping assessments unchanged undermines its validity, as the definition of the skills and

competencies to be measured are continually changing and evolving. This paradoxical situation

should be resolved and, in order to do so, both new technologies and new conceptualizations of

what is being measured must be further explored. The 2006 CBAS study and, even more so, the

2009 ERA study, offer an opportunity to closely scrutinize the results of CBT in comparison to

paper and pencil.

18

1a. PISA 2006 - Computer-Based Assessment of Science – CBAS (optional component)

The initial goal of extending the PISA 2006 assessment of science to include a computer-

delivered element was to administer questions that would be difficult to deliver in a P&P test. In

particular, the goals were reducing the load of reading and written expression; motivating

students for the assessment task; linking dynamic contexts with data interpretation; enabling

student interaction with the media; and allowing assessment of aspects of science not available in

paper-based forms. The relevant questions included video footage, simulations, and animations.

The computer-based assessment of science was field tested in 13 PISA countries in 2005,

and the main study was conducted in only three of them in 2006: Denmark, Iceland and Korea.

Overall achievement within countries did not change from one test modality to the next, yet there

was a tendency for Denmark’s performance to decrease on the computer-based test. Korean

students outperformed Danish and Icelandic students in the computer-based test just as they did

in the P&P test.

In the computer-based test, male performance increased in Iceland and Korea while

female performance decreased. Males outperformed females on the computer-based test in all

three countries. Females outperformed males on the P&P test of science literacy in Iceland,

whereas there was a gender difference in favor of males in the P&P results for Denmark. The

association between reading literacy and achievement on the science literacy was weaker for the

computer-based items than for the P&P items (Turmo and Svein, 2006; OECD, 2010).

An international workshop was held in Reykjavik, Iceland in the autumn of 2008 during

which the following matters were discussed:

• Comparison between paper-and-pencil tests and computer-based assessment

19

• Electronic tests and gender differences

• Adaptive vs. linear computer-based assessment.

For more, see: http://crell.jrc.ec.europa.eu/WP/workshoptransition.htm

1b. PISA 2009 and 2012 - Electronic Reading Assessment – ERA (optional component)

The PISA 2009 ERA optional component was implemented in recognition of the

increasing prevalence of digital texts in many parts of our lives: personal, social and economic.

Even though the core principles of writing texts and the core processes of reading and

understanding texts are similar across media, there are reasons to believe that the specific

features of digital texts call for specific text-processing skills.

The new demands on reading proficiency created by the digital world have led, in PISA

2009, to the inclusion of electronic reading in the reading framework, an inclusion that has, in

turn, resulted in some redefinition both of texts and of the mental processes that readers use to

comprehend the texts. ERA 2009 was designed to investigate students’ proficiency at tasks that

require the access, comprehension, evaluation and integration of digital texts across a wide range

of reading contexts and tasks among today’s 15-year-olds.

Nineteen countries participated in ERA. The ERA used a test administration system

(TAO) developed through the University of Luxembourg. TAO can deliver tests over the

Internet, across a network or (as was the case with ERA) on a standalone computer with student

responses collected on an external memory (USB).

ERA will be offered again as an optional component in PISA 2012 (a pilot study was

conducted in 2011 in around 30 countries).

1c. PISA 2012/2015 – Computer-Based Problem Solving Assessment (compulsory component)

20

The development of competency in problem solving (PS) is a central objective within the

educational programs of many countries. The acquisition of increased levels of competency in

problem solving provides a basis for future learning, for effective participation in society and for

conducting personal activities. Students need to be able to apply what they have learned to new

situations.

What distinguishes the 2012 assessment of PS from the 2003 PS assessment is not so

much the definition of the competency of problem solving, or the focus on problems that only

require low levels of discipline-based knowledge for their solution, but the mode of delivery

(computer-based) of the 2012 assessment and the inclusion of problems that cannot be solved

without the respondent interacting with the problem online. A pilot study was conducted in 2011

in around 40 countries.

In PISA 2015 a new Collaborative Problem Solving assessment will be added, which will

incorporate online assessment of the skills required to solve problems as a member of a group.

1d. PISA 2012 – Electronic Mathematical Assessment (optional component)

The definition of mathematical literacy in PISA 2012 explicitly calls for the use of

mathematical tools, including technological tools, for judgments and decision making.

Computer-based tools are in common use in workplaces of the 21st century and will be

increasingly more prevalent as the century progresses. The nature of work-related problems and

logical reasoning has expanded with these new opportunities, creating new expectations for

individuals.

Because PISA items reflect problems that arise in personal, occupational, social, and

scientific contexts, a calculator is part of many PISA items. Computers and calculators relieve

21

the burden of computation so that individual respondents' attention can be focused on strategies,

concepts and structures rather than on mechanical procedures. A computer-based assessment will

provide the opportunity to extend the integration of technologies − such as statistical tools,

geometric construction and visualization utilities, and virtual measuring instruments − into the

test items.

A pilot study was conducted in 2011 in around 30 countries.

1e. PISA 2015 – Recent Call for Tender

The plans for PISA 2015 are proceeding on the assumption that computer delivery will be

a significant aspect of PISA 2015. The full extent of computer delivery has yet to be established.

The PISA 2015 call for tenders requires the contractor to develop an electronic platform

that is suitable for the assessment of all PISA domains and is capable of all functions from item

development to delivery of the assessment, i.e., item development, item review, test compilation,

test delivery and administration. It should be capable of being operationalized through an

extensive range of operating systems, including delivery over the Internet, in order to maximize

country participation, and should exploit the possibilities that arise from the use of new

technologies to assess students' knowledge and skills in everyday tasks and challenges, in

keeping with PISA's definition of literacy. It should also be adaptable to allow for evolution over

the PISA cycles, e.g., to assess new domains and to cope with new test designs.

Overall, there is a challenge yet to be resolved in how to move to digital assessments

while keeping the trends from previous P&P PISA assessments.

22

2. Program for the International Assessment of Adult Competencies (PIAAC)

Over the past two decades, national governments and other stakeholders have shown a

growing interest in an international assessment of adult skills that allows them to monitor how

well prepared populations are for the challenges of a knowledge-based society. The OECD’s

PIAAC will be the largest and most innovative international assessment of adult skills ever

conducted (Schleicher, 2008).

The primary objectives of PIAAC are to: 1) identify and measure cognitive competencies

believed to underlie both personal and societal success; 2) assess the impact of these

competencies on social and economic outcomes at individual and aggregate levels; 3) gauge the

performance of education and training systems in generating required competencies; and 4) help

to clarify the policy levers that could contribute to enhancing competencies.

At the core of PIAAC is an assessment of the literacy skills among adult populations,

these being understood as the interest, attitude and ability of individuals to appropriately use

sociocultural tools, including digital technology and communication tools, to access, manage,

integrate, and evaluate information, construct new knowledge, and communicate with others. In

addition, PIAAC collects information from respondents concerning their use of key work skills

in their jobs – a first for an international study.

The skills assessed by PIAAC (literacy, numeracy, and problem solving in technology-

rich environments) represent cognitive skills that provide a foundation for effective and

successful participation in modern societies and economies. Levi (2010) argues that a

technology-rich workplace requires foundational skills including numeracy and literacy (both to

be tested in PIAAC), advanced problem-solving skills or Expert Thinking (similar to the

23

construct of Problem Solving in Technology-Rich Environments to be tested in PIAAC) and

advanced communication skills or Complex Communication (not being tested in PIAAC).

PIAAC will offer a far more complete and nuanced picture of the “human capital” on

which countries can count as they compete in today’s global economy. It will help policymakers

assess the effectiveness of education and training systems, both for recent entrants into the labor

market and for older people who may need to continue learning new skills throughout their

lifetimes.

Twenty-six countries are currently implementing PIAAC. A field test was successfully

conducted in 2010, with the main assessment scheduled for 2011-12. A second round of PIAAC

is planned to allow additional countries to participate in, and benefit from, the assessment. The

assessments will be available in paper- and computer-based formats.

The International Association for the Evaluation of Educational Achievement (IEA)

The IEA has conducted various international surveys regarding the implementation of

ICT in education around the globe. Twenty-two countries participated in the first stage of the

Computers in Education Study and, in 1989, conducted school surveys in elementary, lower

secondary, and upper secondary schools. In 1992 the second stage of the study repeated the

surveys of the first stage and added an assessment of students. The rapid diffusion of the Internet

and multimedia technology during the mid-1990s generated an interest in a new study that,

among other things, could investigate the changes in the curricula and classrooms since IEA's

earlier study.

The Second International Technology in Education Study (SITES) was initiated in 1996

by the IEA and school surveys were conducted in 1998. The SITES study consists of three

24

modules. The survey data of module 1 were collected in 1998. The module 2 case studies that

involved visits to school sites were conducted during 2000 and 2001, and the reports were

released in 2002 and 2003. Module 3 was launched in 2001, but the data for the surveys and

student assessments was collected during 2004, with the results released in 2005 and 2006.

A new IEA computerized international large-scale assessment for students – ICILS –will

be administered in 2013.

International Computer and Information Literacy Study (ICILS) – 2013

ICILS will examine the outcomes of student computer and information literacy (CIL)

across countries.

Computer and information literacy refers to an individual’s ability to use computers to

investigate, create, and communicate in order to participate effectively at home, at school, in the

workplace, and in the community. Twenty countries have registered so far to participate in this

study.

The assessment of CIL will be authentic and computer-based. It will incorporate

multiple-choice and constructed-response items based on realistic stimulus material; software

simulations of generic applications so that students are required to complete an action in

response to an instruction; and authentic tasks that require students to modify and create

information products using “live” computer software applications.

ICILS 2013 will be the first international comparative study of student preparedness to

use computers to investigate, create and communicate at home, at school and in the broader

community. ICILS will assess students’ capacity to:

25

• Use technology to search for, retrieve and make effective judgments about the

quality and usefulness of information from a range of sources (such as the

Internet)

• Use technology to transform and create information

• Use technology to communicate information to others

• Recognize the ethical and legal obligations, responsibilities and potential dangers

associated with digital communication.

ICILS 2013 will provide policymakers with results that will enhance understanding of

factors responsible for achievement in computer-based tasks. It will also inform policy on the

possible contribution of educational systems to the use of computers for digital communication

and information literacy as an essential skill in the 21st century.

A pilot study will be conducted in 2012.

For more, see: http://www.acer.edu.au/icils/

Mega Technology Companies Collaborate to Integrate Technology in ILSAs

Three leading technology companies − Cisco, Intel and Microsoft – collaborated in 2009

with the University of Melbourne with the goal of transforming global educational assessment

and improving learning outcomes − Assessment and Teaching of 21st Century Skills (ATC21S).

The goals of the project are to mobilize international educational, political and business

communities to make the transformation of educational assessment and, hence, instructional

practice a global priority; to specify in measurable terms high-priority understanding and skills

needed by productive and creative workers and citizens of the 21st century; to identify

26

methodological and technological barriers to ICT-based assessment; to develop and pilot new

assessment methodologies; to examine and recommend innovative ICT-enabled, classroom-

based learning environments and formative assessments that support the development of 21st

century skills.

The collaboration initiated and produced five important white papers (Griffin, McGaw &

Care, 2011) in the first phase of this development project:

• Defining 21st century skills

• Perspectives on methodological issues

• Technological issues for computer-based assessment

• New assessments and environments for knowledge building

• Policy frameworks for new assessments

Through 2010-2012 the focus of the project is on development of the assessment

methodologies and dynamic tasks. ATC21S has received the support of major international

assessment organizations, as well as participating governments’ departments of education,

through representation on the ATC21S Advisory Panel.

For more, see: http://atc21s.org/

Conclusion

Neither assessments nor technologies are goals in and of themselves. The merit in both is

only if they make a significant impact on education by improving instruction and increasing

(directly or indirectly) the opportunity for each pupil to learn and progress. However, the more

27

technology becomes integrated in instruction, the more the need is to adjust assessment to

become digital and align with the new instructional environment.

Do all the above new and innovative digital assessments indicate that the educational

world is approaching a turning point regarding the incorporation of technology into large-scale

assessments? Are schools pedagogically, technologically, logistically, and socially prepared for

this development? What are the implications for educators, and for policymakers? What will

make this mega-investment worthwhile?

Thoughtful integration of technology into assessment may meet several complementary

goals:

• Supporting and enhancing the integration of technology into learning.

• Allowing for an assessment of complex cognitive skills (e.g., Diagnoser, SimScientists).

• Designing a new accountability paradigm, that fully integrates sequential formative

assessments and periodic summative assessments via computers (i.e., CBAL).

The present structure and development of electronic ILSAs certainly support the first two

goals – enhancing the integration of technology into learning, and providing an appropriate

response for the measurement of complex cognitive abilities, thus increasing test validity. Also,

beyond measuring 21st century skills and broadening the measured construct, computerized

international assessments can provide a more useful profile of countries whose students are

clustered in particularly high or low levels of performance.

However, new accountability paradigm approaches, proposed by CBAL and the Race to

the Top consortia, according to which formative and summative assessment are fully aligned and

integrated via technology, may not be easily applicable to ILSAs. That is, the ability to integrate

continual formative assessments with periodic summative assessments entails complete

28

alignment of the curricula, the content and standards with the external and internal assessment

tasks. It is not clear whether such an alignment is possible, or even desired, with regard to the

ILSAs.

Additionally, ILSAs take place on a cycle that is well suited for international

comparisons (three, four or five years). However, as ILSAs have become high-stakes and

excelling on them has become a national goal, this cycle causes certain local irregularities in the

organization of learning and NLSAs in years that coincide with ILSA cycles. Thus, there is a

need to better align the national and international assessments in countries where both systems

exist.

In summary, several issues have yet to be resolved before moving NLSAs and ILSAs to

full blown digital and online systems: adaptive versus linear administration of the assessments;

ensuring gender equality; narrowing the digital divide; school system readiness (personnel,

hardware, connectivity); infrastructure adaptation to various languages (including right-to-left

languages) and more.

Also, there is a challenge in how to equate P&P and digital assessments with the goal of

maintaining long-term trends, as the two goals are somewhat incompatible. Changing the nature

of tests generally breaks trends, while keeping tests unchanged is unsatisfactory as the definition

of the skills and competencies to be measured are continually changing. This paradoxical

situation has to be resolved and in order to do so both new technologies and new

conceptualizations of what is being measured have to be explored.

Nevertheless, it is clear that technology will continue to advance and improve both

NLSAs and ILSAs in an evolutionary manner. Technology even has the potential to

revolutionize NLSAs and its alignment with learning and teaching as proposed by the CBAL

29

model. However, the extent to which ILSAs, based on a common international framework that is

often not fully aligned with the national curricula, can join this revolution remains a pedagogical

and strategic challenge.

30

References

Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke-Midura, J., Torres,C., Haertel,

G., Dolan, R., Beddow, P., and Lazarus, S. (2010). Technology-Enabled and Universally

Designed Assessment: Considering Access in Measuring the Achievement of Students

with Disabilities – A Foundation for Research. Journal of Technology, Learning, and

Assessment, 10 (5). Retrieved May 2011 from: http://www.jtla.org

Bejar, I., and Graf, E. A. (2010). Updating the Duplex Design for Test-Based Accountability in

the Twenty-First Century Measurement: Interdisciplinary Research and Perspective, 8

(2,3), 110-129.

Bennett, R. E. (2001). How the Internet will help large-scale assessment reinvent itself.

Education Policy Analysis Archives, 9 (5).

Bennett, R. E. (2002). Inexorable and inevitable: the continuing story of technology and

assessment. The Journal of Technology, Learning and Assessment. Retrieved May 2011

from: http://escholarship.bc.edu/jtla/

Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): a

preliminary theory of action for summative and formative assessment. Measurement:

Interdisciplinary Research and Perspectives, 8, 70–91.

Bennett, R.E., Persky, H., Weiss, A.R., and Jenkins, F. (2007). Problem Solving in Technology-

Rich Environments: A Report From the NAEP Technology-Based Assessment Project

(NCES 2007–466). U.S. Department of Education. Washington, DC: National Center for

Education Statistics. Retrieved May 2011 from:

http://nces.ed.gov/nationsreportcard/pdf/studies/2007466.pdf

31

Bennett, R.E., Persky, H., Weiss, A., and Jenkins, F. (2010). Measuring Problem Solving with

Technology: A Demonstration Study for NAEP. Journal of Technology, Learning, and

Assessment, 8(8). Retrieved May 2011 from: http://www.jtla.org

Bennett, R. E., and Gitomer, D. H. (2009). Transforming K–12 assessment. In C. Wyatt-Smith

and J. Cumming (Eds.), Educational assessment in the 21st century. New York: Springer.

Chudowsky, N., Pellegrino, J. W., (2003). Large-scale assessments that support learning: what

will it take? Theory Into Practice 42.1, 75-83

Johnstone, C., Thurlow, M., Altman, J., Timmons, J., and Keto, K. (2009). Assistive Technology

Approaches for Large-Scale Assessment: Perceptions of Teachers of Students with

Visual Impairments. Exceptionality, 17(2), 66-75.

Koretz, D., and Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan

(Ed.), Educational Measurement (4th edition). Westport. CT: American Council on

Education/Praeger (pp 531-578).

Kozma, R. (2009). Assessing and teaching 21st century skills: A call to action . In F.

Schueremann and J. Bjornsson (eds.), The transition to computer-based assessment: New

approaches to skills assessment and implications for large scale assessment. Brussels:

European Communities, (PP 13-23). Retrieved May 2011 from:

http://www.worldclassarena.net/doc/file17.pdf

Griffin, P., McGaw, B., & Care, E. (Eds.) (2011). Assessment and Teaching of 21st Century

Skills. Dordrecht: Springer.

Herman, J., Dai, Y., Htut, A. M., Martinez, M., & Rivera, N. (2011). Evaluation of the enhanced

assessment grants (EAGs) SimScientists program: Site visit findings. (CRESST Report

32

791). Los Angeles, CA: University of California, National Center for Research on

Evaluation, Standards, and Student Testing (CRESST). Retrived November 2011 from:

http://www.cse.ucla.edu/products/reports/R791.pdf

Levi, F. (2010). How Technology Changes Demands for Human Skills. Retrieved May 2011

from OECD iLibrary: http://www.oecd-ilibrary.org/education/how-technology-changes-

demands-for-human-skills_5kmhds6czqzq-en

Linn, R. (2010). A New Era of Test-Based Educational Accountability. Measurement:

Interdisciplinary Research and Perspectives, 8 (2,3), 145 – 149. Retrieved May 2011

from: http://pdfserve.informaworld.com/373233__926993282.pdf

Nichols, S. N. and Berliner, D. C. (2007). Collateral Damage: The effects of high-stakes testing

on America’s schools. Cambridge, MA: Harvard Education Press

OECD. (2010). PISA Computer-Based Assessment of Student Skills in Science. OECD

Publishing.

Quellmalz, E. S., and Pellegrino, J. W. (2009). Technology and Testing. SCIENCE , 323, 75-79.

Quellmalz, E. S., Timms, M. J., and Buckley, B. (2009). Using Science Simulations to Support

Powerful Formative Assessments of Complex Science Learning. WestEd. Retrieved May

2011 from:

http://www.simscientists.org/downloads/Quellmalz_Formative_Assessment.pdf

Quellmallz, E. S., Silberglitt, M. D. & TimmsM. J., (2011). How Can Simulations be

Components of Balanced State Science Assessment Systems? Policy Brief. WestEd.

Retrieved November 2011 from:

http://simscientist.org/downloads/SimScientistsPolicyBrief.pdf

33

Sandene, B., Horkay, N. Bennett, R. E., Allen, N. , Braswell, J., Kaplan, B., and Oranje, A.

(2005). Online Assessment in Mathematics and Writing: Reports From the NAEP

Technology-Based Assessment Project, Research and Development Series. Retrieved

May 2011 from: http://nces.ed.gov/nationsreportcard/pubs/studies/2005457.asp

Schleicher, A. (2008). PIAAC: A New Strategy for Assesing Adult Competencies. International

Review of Education, 54 (5,6), 627-650.

The Assessment and Teaching of 21st-Century Skills (ATC21S). http://atc21s.org/

Thompson, S., Johnstone, C. J., and Thurlow, M. L. (2002). Universal Design Applied to Large

Scale Assessments. Retrieved May 2011 from:

http://www.cehd.umn.edu/NCEO/onlinepubs/synthesis44.html

Turmo, A., and Svein, L. (2006). PISA's Computer-based Assessment of Science- A gender

equity perspective. AEA-E Annual Conference 2006, Assessment and Equity . Naples,

Italy. Retrieved May 2011 from:

http://www.aea-europe.net/userfiles/D1%20Are%20Turmo%20&%20Svein%20Lie.pdf

U.S. Department of Education. (2010). U.S. Secretary of Education Duncan Announces Winners

of Competition to Improve Student Assessments. Retrieved may 2011 from:

http://www.ed.gov/news/press-releases/us-secretary-education-duncan-announces-

winners-competition-improve-student-asse

34

Appendix 1

Kozma (2009) provided an extensive list of the potential advantages and challenges of incorporating ICT into large-scale assessments (exact quote): Advantages:

• Reduced costs of data entry, collection, aggregation, verification, and analysis.

• The ability to adapt tests to individual students, so that the level of difficulty can be

adjusted as the student progresses through the assessment and a more refined profile of

skill can be obtained for each student.

• The ability to efficiently collect and score responses, including the collection and

automated or semi-automated scoring of more sophisticated responses, such as extended,

open-ended text responses.

• The ability to collect data on students’ intermediate products, strategies and indicators of

thought processes during an assessment task, in addition to the student’s final answer.

• The ability to take advantage of ICT tools that are now integral to the practice and

understanding of subject domains, such as the use of idea organizers for writing, data

analysis tools in social science, and visualization and modeling tools in natural science.

• The ability to provide curriculum developers, researchers, teachers, and even students with

detailed information that can be used to improve future learning.

Technological challenges: Among the technological challenges that might inhibit the use of ICT-based assessments are:

• Significant startup costs for assessment systems that have previously implemented only

paper-and-pencil assessments. These costs would include hardware, software, and network

35

purchases; software development related to localization; and technical support and

maintenance.

• The need to choose between the use of “native” applications that would not allow for

standardization but would allow students use the applications with which they are most

familiar, the use of standardized off-the-shelf applications that would provide

standardization but may disadvantage some students that regularly use a different

application, or the use of specially developed “generic” applications that provide

standardization but disadvantage everyone equally.

• The need to integrate applications and systems so that standardized information can be

collected and aggregated.

• The need to choose between standalone implementation versus Internet-based

implementation. If standalone, the costs of assuring standardization and reliable operation,

as well as the costs of aggregating data. If Internet-based, the need to choose between

running applications locally or having everything browser-based.

• If the assessment is Internet-based, issues of scale need to be addressed, such as the

potentially disabling congestion for both local networks and back-end servers as large

numbers of students take the assessment simultaneously.

• Issues of security are also significant with Internet-based assessments.

• The need to handle a wide variety of languages, orthographies, and symbol systems for

both the delivery of the task material and for collection and scoring of open-ended

responses.

• The need to keep up with rapidly changing technologies and maintaining comparability of

results, over time.

36

• The need for tools to make the design of assessment tasks easy and efficient.

• The lack of knowledge of technological innovators about assessment, and the

corresponding paucity of examples of educational software that incorporates with high-

quality assessments.

Significant methodological challenges include:

• The need to determine the extent to which ICT-based items that measure subject

knowledge should be equivalent to legacy paper-and-pencil-based results.

• The need to detail the wider range of skills that can only be assessed with ICT.

• The need to determine the age-level appropriateness of various 21st century skills.

• The need to design complex, compound tasks in a way such that failure on one task

component does not cascade through the remaining components of the task or result in

student termination.

• The need to integrate foundational ideas of subject knowledge along with 21st century

skills in the assessments. At the same time, there is a need to determine the extent to

which subject knowledge should be distinguished from 21st century skills in assessment

results.

• The need to incorporate qualities of high-level professional judgments about student

performances into ICT assessments, as well as support the efficiency and reliability of

these judgments.

• The need to develop new theories and models of scoring the students’ processes and

strategies during assessments, as well as outcomes.

• The need to establish the predictive ability of these judgments on the quality of subsequent

performance in advanced study and work.

37

• The need to distinguish individual contributions and skills on tasks that are done

collaboratively.

For more, see: http://www.worldclassarena.net/doc/file17.pdf

technologies in large-scale assessments: new directions...

Documents