technologies in large-scale assessments: new directions...
TRANSCRIPT
1
Technologies in Large-Scale Assessments: New Directions, Challenges, and Opportunities
Michal Beller
Director-General RAMA – The National Authority for Measurement and Evaluation in Education
Israel
2
Introduction
Assessment serves a critical role in education. It holds education systems accountable
and, at the same time, serves as a gateway to systemic change in education. Assessment signals
priorities for curriculum and instruction, and in some jurisdictions, teachers tend to model their
pedagogical approach on standardized large-scale assessments. Curriculum developers and
designers of educational materials respond to high-stakes tests by modifying existing textbooks
and other instructional materials and by developing and marketing new ones suited to test
expectations. Ultimately, assessment drives change at all levels, from classroom to legislature,
though its effects range from the positive to the negative (e.g., Koretz and Hamilton, 2006;
Nichols and Berliner, 2007).
Therefore, the contents and the mode of assessment should be designed carefully and
thoughtfully to ensure the impact on education is positive. For this to happen, assessments should
be more aligned with the intended curriculum, embedded in authentic contexts, and fully
integrated into the learning and teaching process. Effective integration of assessment in learning
and teaching is a challenge, particularly due to the need to balance between formative and
summative assessments. Technology can assist in achieving such a challenging goal (e.g.,
Bennett, 2001, 2002).
It is clear that existing innovative technologies, such as smart phones and tablets, as well
as future technologies, hold the potential to dramatically change the way assessment will be
implemented in the future. However, this chapter deals neither with the pure technological
opportunities ahead of us, nor with the technological barriers existing today (e.g., access to
computers in schools and bidirectional adaptation of the test platform to various languages such
as Arabic and Hebrew).
3
This paper describes the recent developments of technology-based national large-scale
assessments (NLSAs) and international large-scale assessments (ILSAs) and addresses whether
the digital revolution of ILSAs will be merely a technological step forward or serve as a catalyst
for a more profound pedagogical change in the way instruction, learning and assessments will be
conducted in the future. We look at whether it will foster the integration of 21st century
competencies and expertise into teaching and instruction of all content areas, and whether it will
facilitate the creation of new methodologies for a better use of technology and assessment in the
service of learning.
The Role of Technology in Assessment
Information and communication technologies (ICT) assist in automating various phases
of testing processes such as designing, presenting, recording, and distributing test materials.
However, one main goal of integrating ICT is to enable assessment of those aspects of cognition
and performance that are complex and dynamic and have been impossible to assess directly via
paper and pencil (P&P). New capabilities afforded by technology include directly assessing
problem-solving skills, exposing the sequences of actions taken by learners in solving problems,
and modeling complex reasoning tasks. Technology also makes possible the collection of data
regarding students' concept organization and other aspects of students' knowledge structures, as
well as representations of their participation in socially interactive discussions and projects
(Chudowsky and Pellegrino, 2003).
By enriching assessment situations through the use of multimedia, interactivity and
control over the stimulus display, it is possible to assess a much wider array of constructs than
was previously possible. For example, SimScientists uses science simulations to support
powerful formative assessments of complex science learning (Quellmalz et al., 2009). The
4
simulation-based assessment tasks are dynamic, engaging and interactive. They are capable of
assessing complex science knowledge and inquiry skills, which go well beyond the capabilities
of printed tests.
Advances in measurement, technology and in the understanding of cognition hold
promise for creating assessments that are more useful and valid indicators of what students know
and can do. Such work involves reconceptualizing assessment design and use and tying
assessment more directly to the processes and contexts of learning and instruction (Chudowsky
and Pellegrino, 2003; Quellmalz and Pellegrino, 2009; Bejar and Graf, 2010). Success in tying
assessment more closely to contexts and specific learning processes is what will make these new
developments more effective.
Universal Design of Assessment
There is a push in several countries (the United States in particular) to expand national
and state testing and to require that assessment systems encompass all students – including those
with disabilities, learning disorders and those with limited proficiency in the language of
instruction – many of whom have not been included in these systems in the past.
Technology allows for the development of assessments using universal design principles
that make assessments more accessible, effective, and valid for students with greater diversity in
terms of disabilities and limited language proficiency (mainly non-native), thus allowing for
wider student inclusion (e.g., Johnstone et al., 2009; Thompson et al., 2002). Technology allows
for presentation and assessment using different representations of the same concept or skill and
can accommodate various student disabilities and strengths. Therefore, presenting information
through multiple modalities enlarges the proportion of the population that can be assessed fairly.
5
Similarly, assistive technology can make it possible for students with disabilities and those who
require special interfaces to interact with digital resources, allowing them to demonstrate what
they know and can do in ways that would be impossible with standard print-based assessments.
Almond et al. (2010) proposed stimulating research into technology-enabled assessments
that incorporate conditions designed to make tests appropriate for the full range of the student
population by enhancing accessibility. The prospect is that rather than having to retrofit existing
assessments to include these students (through the use of large numbers of accommodations or a
variety of alternative assessments), new assessments can be designed and developed from the
outset to allow participation of the widest possible range of students, in a way that results in valid
inferences about performance for all students who participate in the assessment.
The Role of Technology in National Large-Scale Assessments
Efforts are being made for system wide implementation of technology-based assessments
in order to extend, improve or replace existing assessment systems, or to create entirely new
assessment systems. Kozma (2009) provided a comprehensive list of the potential advantages
and challenges of incorporating ICT into large-scale assessments (See exact quote in Appendix
1).
Delivery of national assessments via computer is becoming more prevalent as changes
are made in assessment methodologies that reflect practical changes in pedagogical methods.
Introducing computer-based testing (CBT) is often viewed as a necessary and positive change in
large-scale educational assessment – not only because it reflects changes in classroom
instruction, which is becoming more and more computer-mediated, but also because it can
provide a number of assessment and administrative advantages (Bennett, 2010).
6
A systemic integration of digital components in NLSA began in the United States with
the National Assessment of Educational Progress (NAEP) (e.g., Sandene et al., 2005; Bennett et
al., 2007), and is being implemented today in a number of NLSA systems around the globe as
part of a process of integrating technology into learning and instruction (e.g., Iceland, Denmark,
Hungary, Israel and Australia). However, results from the NAEP Problem Solving in
Technology-Rich Environment study (Bennett et al., 2007, 2010) made it clear that going beyond
traditional testing is still extremely challenging. Recent progress on SimScientist is encouraging
– findings from an evaluation study based on six states (Herman, Dai, Htut , Martinez & Rivera,
2011) provides recommendations for incorporating these simulation-based assessments into state
science assessment systems in addition to the formative classroom use (Quellmallz, Silberglitt &
Timms, 2011) .
An elaboration on the main relevant developments in Israel and the United States follow
below.
Israel
Instruction, Learning and Assessment via Computers
The Israeli education system went through several phases of integrating technologies into
instruction, learning and assessment (not yet with complete school coverage):
I. 1975-1990: The Infrastructure Phase I – Setting up computer labs in schools,
mainly for drill and practice.
II. 1999-1990 : The Infrastructure Phase II – Personal computer labs featuring
multimedia programs.
7
III. 1999-2005: The Internet Phase – Proliferation of school websites. Sporadic web-
based activities aimed at information literacy and project-based learning.
IV. 2006–present: A shift to web-based content and tools (e.g., CET, Time-to-Know,
Galim) aligned with the national curricula (links below).
CET develops both digital content and advanced technology for instruction, learning and
assessment (ILA) by various means. The ILA system provides (in both Hebrew and Arabic):
• Optimal integration of the latest technologies available
• Integrated learning units with assessment tasks
• Classroom–home continuum
• Use of a learning management system for managing learning and assessment
• Customization for students with learning disabilities
The ILA system takes full advantage of computer capabilities for instruction, learning
and assessment, such as simulations and animations; virtual labs; video clips; maps; narration;
information literacy tasks on the web, and interactive feedback for the student. For example, the
online elementary school system – Ofek (horizon, in Hebrew), a bilingual (Hebrew and Arabic)
teaching, learning and assessment environment, contains a learning management system and a
large collection of curriculum-based learning activities and assessment assignments in most
subjects (Hebrew, math, science, social studies, Bible, current events, etc.). Ofek also offers
innovative generators, such as an online comic strip generator, math exercises generator, writing
generator and more, all allowing teachers to create their own teaching material in line with the
specific needs of their students.
8
For more on CET, see: http://cet.org.il/pages/Home.aspx and http://ofek.cet.ac.il
For more on Time-To-Know, see: http://www.timetoknow.com/
For more on Galim, see: http://www.galim.org.il/fields/english.html
The United States
The U.S. education system went through phases similar to the Israeli ones with regard to
integrating technologies. However, the U.S. system was one of the first in the world, if not the
first, to conduct a large scale online standardized assessment via its National Assessment of
Educational Progress (NAEP).
The National Assessment of Educational Progress (NAEP)
NAEP conducted three field investigations as part of the Technology-Based Assessment
Project, which explored the use of new technology in administering NAEP.
The first field investigation was the Math Online (MOL) study. It addressed issues related
to measurement, equity, efficiency, and operations in online mathematics assessment. In the
MOL study, data were collected from more than 100 schools at each of two grade levels in
spring 2001 (Sandene et al., 2005).
The 2002 Writing Online (WOL) study was the second of three field investigations in the
Technology-Based Assessment project, and it explored the use of new technology in
administering NAEP.
The 2003 Problem Solving in Technology-Rich Environments study (TRE) was the third
study and it investigated how computers might be used to measure skills that cannot be measured
on a paper and pencil test (Bennett et al., 2007, 2010).
9
In continuation of the previous studies, NAEP recently completed a multistate field test
of online writing assessment for eighth and 12th grades to be operational in 2011. The design of
the 2011 NAEP writing assessment reflects the way today’s students compose written texts − and
are expected to compose texts − particularly as they move into postsecondary settings. The
assessment is designed to measure the ability of students in grades 8 and 12 to write using word
processing software with commonly available tools.
For more, see: http://nces.ed.gov/nationsreportcard/writing/cba.asp
U.S. State Assessments
West Virginia and Oregon are among the leading states in moving forward on the
integration of technology into their state assessments.
1) West Virginia
West Virginia’s techSteps program is a literacy framework based on the National
Education Technology Standards for Students (NETS*S) and is aligned with state and Common
Core State Standards.
TechSteps is a personalized, project-based technology literacy curriculum that infuses
technology skills into core instruction, promoting core subject area outcomes while also teaching
skills for the 21st century. A Technology Literacy Assessment Profile is built for the student and
updated as each new activity is performed and recorded. This approach allows educators to teach
and assess technology literacy in an integrated and systematic manner, using authentic
performance-based assessment to provide meaningful feedback to students and generate data for
10
reporting purposes. This system provides West Virginia with statewide student data
on technology proficiencies at each grade level.
For more, see: http://www.techsteps.com/public/home/
2) Oregon
The Oregon Assessment of Knowledge and Skills (OAKS), is the name for the larger
Oregon statewide assessment system. This new online testing system assesses students' mastery
of Oregon content standards. The OAKS Online operational test is now available for reading,
mathematics, science, and social sciences.
For more, see: http://www.oaks.k12.or.us/
Recent Developments in the United States Cognitively Based Assessments of, for and as Learning (CBAL) – A Research Project
The improvements in large-scale assessments, important as they may be, will not resolve
the growing tension between internal and external school assessment. An innovative research
project proposed by Educational Testing Service (ETS) is CBAL (Bennett and Gitomer, 2009;
Bennett, 2010), which attempts to resolve this tension.
Technology plays a key role in the CBAL project. The central goal of CBAL is to create
a comprehensive system of assessment that documents what students have achieved ("of
learning"), helps identify how to plan and adjust instruction ("for learning"), and is considered by
students and teachers to be a worthwhile educational experience in and of itself ("as learning").
The computerized system, when completed, will attempt to unify and create synergy among
accountability testing, formative assessment and professional support.
11
Accountability tests, formative assessment and professional support will be derived from
the same conceptual base. This base will rest upon rigorous cognitive research, common core or
state standards, and curricular considerations. CBAL assessments will consist largely of
engaging, extended, constructed-response tasks that are delivered primarily by computer and, as
much as possible, automatically scored. CBAL assessments are designed to help students take an
active role in the assessment of their own learning.
It should be noted, however, that for such an integrated accountability system to be
relevant and effective, the participating educational bodies should be in full agreement as to
curricula and standards.
For more, see: http://www.ets.org/research/topics/cbal/initiative
U.S. Department of Education (DOE)
I'm calling on our nation's governors and state education chiefs to develop
standards and assessments that don't simply measure whether students can fill in
a bubble on a test, but whether they possess 21st century skills like problem
solving and critical thinking and entrepreneurship and creativity.
– President Barack Obama, address to the Hispanic Chamber of Commerce,
March 10, 2009
In an effort to provide ongoing feedback to teachers during the course of the school year,
measure annual student growth, and move beyond narrowly focused bubble tests, the U.S.
Department of Education has awarded grants (in September 2010) to two groups of states to
develop a new generation of tests. The consortia – the Partnership for Assessment of Readiness
for Colleges and Careers (PARCC), and the Smarter Balanced Assessment Consortium (SBAC)
12
– were awarded $170 million and $160 million, respectively, to design assessments that evaluate
students based on common-core standards by the 2014-15 school year. The tests will assess
students' knowledge of mathematics and English language arts from third grade through high
school.
As I travel around the country the number one complaint I hear from teachers is
that state bubble tests pressure teachers to teach to a test that doesn't measure
what really matters. Both of these winning applicants are planning to develop
assessments that will move us far beyond this and measure real student
knowledge and skills.
– U.S. Education Secretary Arne Duncan
(http://www.ed.gov/news/press-releases/us-secretary-education-duncan-announces-
winners-competition-improve-student-asse)
The Smarter coalition will test students using computer adaptive technology that will
tailor questions to students based on their answers to previous questions. Smarter will continue to
use one test at the end of the year for accountability purposes but will create a series of interim
tests used to inform students, parents, and teachers about whether students are on track.
PARCC’s next-generation assessment system will provide students, educators,
policymakers and the public with the tools needed to identify whether students – from grade 3
through high school – are on track for postsecondary success, and, critically, where gaps may
exist and how they can be addressed well before students enter college or the workforce.
With such new technology-based assessments, schools may no longer have to interrupt
their routine instructional processes at various times during the year to administer external tests
13
to students, not to mention the time saved on preparing them for the tests. Such a scenario would
be made possible when schools implement assessments for both formative and summative
purposes in a manner developed now in CBAL.
The cognitively-based assessment of, for, and as learning (CBAL) system has many of
the features that are envisioned in the PARCC and SBAC consortia. The CBAL summative
assessments are not one-time events, but rather are spread over several occasions throughout the
school year. This is a feature that has attracted a good deal of attention in discussions of the
consortia plans. CBAL also emphasizes the use of formative assessments by teachers and in
teacher professional development. This emphasis on formative assessment and teacher
professional development can also be found in the broad outlines of the consortia plans (Linn,
2010).
The Role of Technology in International Large-Scale Assessments (ILSAs)
Among the more familiar LSA systems are the International Large-Scale Assessments
(ILSAs), which are designed to enrich the knowledge and understanding of decision-makers in
education systems in different countries through international comparisons and comparative
studies in central areas of education. The major players in this arena are the International
Association for the Evaluation of Educational Achievement, or IEA, with Trends in
International Mathematics and Science Study (TIMSS), Progress in International Reading
Literacy Study (PIRLS), International Computer and Information Literacy Study (ICILS), and
others), and the Organization for Economic Cooperation and Development (OECD) with the
Program for International Student Assessment (PISA), Program for the International Assessment
of Adult Competencies (PIAAC,) and the Teaching and Learning International Survey
14
(TALIS),.These international assessments usually involve nationally representative samples
composed of thousands of students.
The growing popularity of ILSAs has been accompanied by several phenomena. In many
countries today ILSAs constitute de facto standards for learning goals in the subjects assessed,
and as a result the curricula in different countries have been aligned with the theoretical
framework of these international assessments. In a number of countries the results and rankings
of ILSAs have become politically high stakes: ministers of education and governments perceive
them as being an indicator of the success of their local policy (e.g., Poland, Germany, and Israel).
As such, international assessments intensify the negative consequences that often accompany
high-stakes tests in different countries ("teaching to the test," diverting resources, etc.).
In international assessment research programs, as in national and local assessment
programs, three different themes are evident in the application of ICT. One is the use of ICT to
better assess the domains that have traditionally been the focus of assessment in schools: reading,
mathematics and science. Note that the use of ICT to create richer and more interactive
assessment materials increases validity and enables the assessment of aspects within the domains
that, up to now, have been difficult to assess. A second theme is the use of technology to assess
more generic competencies, such as ICT skills and a broad set of generalizable and transferable
knowledge, skills and understandings that are relevant to managing and communicating
information. A third theme is the use of technology to assess more complex constructs, which are
less well understood and characterize much of the thinking about 21st century skills. Such
constructs include creativity and collaborative problem solving (as typified by the Assessment
and Teaching of 21st Century Skills (ATC21S) Project.
15
In 2006, for the first time and on an experimental basis, PISA included an optional
computer-based component in the assessment of student achievements in science (CBAS). In
PISA 2009, in addition to the P&P assessments, countries were offered the opportunity to
participate in a computerized assessment of electronic reading of texts (ERA). In 2011, 15-year-
old students around the world took part in a pilot study of three computerized assessments
(problem solving, mathematical literacy, and reading of electronic texts) as a pilot study for
PISA 2012. Also, adults (16 to 64 years old) around the world were assessed digitally on
PIAAC. In 2012, eighth graders around the world will participate in a pilot study in preparation
for ICILS 2013. For 2015, PISA plans are to proceed on the assumption that computer delivery
will be a significant aspect of the overall assessment. The full extent of computer delivery has
yet to be established.
A brief elaboration on all these new assessments follows below.
Organization for Economic Cooperation and Development (OECD)
Since 2006, the OECD has gradually begun introducing technology into its assessments
(PISA and PIAAC):
1. The Program for International Student Assessment −−−− PISA
PISA is an internationally standardized assessment that was jointly developed by
participating economies and administered to 15-year-olds in schools. PISA focuses on young
people’s capacity to demonstrate their preparedness in the fundamental domains of reading
literacy, mathematical literacy, and scientific literacy. Four PISA cycles have been carried out so
far (in 2000, 2003, 2006 and 2009). Close to 70 countries and large economies participated in the
2009 cycle.
16
PISA is different from other international surveys, such as TIMSS, which is curriculum-
based, in that it attempts to assess the skills and competencies each learner needs for further
study and success in the future. Although the basic skills assessed in PISA (reading, math,
science) are similar to other assessment programs, their definition is different and broader and
the tasks are put into the context of everyday situations with which most people will have to deal.
Therefore, it is not surprising that the PISA governing board decided to make use of
computer-based assessments, not only to measure ICT literacy skills, but also to allow for the
provision of a wider range of dynamic and interactive tasks and to explore more efficient ways of
carrying out the main tests of student knowledge and skills in reading, mathematics and science.
The initial goal was to begin the digital implementation in 2003. However, it only
materialized in 2006. A list of past, present and future cycles of PISA is presented in Table 1,
with the green cells indicating the extended domain assessed in each cycle, and the blue letters
indicating digital assessment components (added to or replacing paper and pencil).
17
Table 1 – PISA cycles and the digital components
Other Domains
Science Literacy
Mathematics Literacy
Reading Literacy
+++PISA 2000
Problem Solving
+++PISA 2003
+CBAS
++PISA 2006
+++ERA
PISA 2009
Problem SolvingFinancial Literacy
++EM
+ERA
PISA 2012
Collaborative Problem Solving
CBASEMERAPISA 2015
+ = Paper and pencil assessments
CBAS = Computer-Based Assessment of Science
ERA = Electronic Reading Assessment
EM = Electronic Math
Note: Cells in green denote the extended domain assessed in a given year
Overall, there is a challenge in how to move forward to CBT while keeping the paper and
pencil trends of the scores across the years. In a way, the two goals of preserving trends and
moving to CBT are incompatible. Changing the nature of tests generally breaks the trends, while
keeping assessments unchanged undermines its validity, as the definition of the skills and
competencies to be measured are continually changing and evolving. This paradoxical situation
should be resolved and, in order to do so, both new technologies and new conceptualizations of
what is being measured must be further explored. The 2006 CBAS study and, even more so, the
2009 ERA study, offer an opportunity to closely scrutinize the results of CBT in comparison to
paper and pencil.
18
1a. PISA 2006 - Computer-Based Assessment of Science – CBAS (optional component)
The initial goal of extending the PISA 2006 assessment of science to include a computer-
delivered element was to administer questions that would be difficult to deliver in a P&P test. In
particular, the goals were reducing the load of reading and written expression; motivating
students for the assessment task; linking dynamic contexts with data interpretation; enabling
student interaction with the media; and allowing assessment of aspects of science not available in
paper-based forms. The relevant questions included video footage, simulations, and animations.
The computer-based assessment of science was field tested in 13 PISA countries in 2005,
and the main study was conducted in only three of them in 2006: Denmark, Iceland and Korea.
Overall achievement within countries did not change from one test modality to the next, yet there
was a tendency for Denmark’s performance to decrease on the computer-based test. Korean
students outperformed Danish and Icelandic students in the computer-based test just as they did
in the P&P test.
In the computer-based test, male performance increased in Iceland and Korea while
female performance decreased. Males outperformed females on the computer-based test in all
three countries. Females outperformed males on the P&P test of science literacy in Iceland,
whereas there was a gender difference in favor of males in the P&P results for Denmark. The
association between reading literacy and achievement on the science literacy was weaker for the
computer-based items than for the P&P items (Turmo and Svein, 2006; OECD, 2010).
An international workshop was held in Reykjavik, Iceland in the autumn of 2008 during
which the following matters were discussed:
• Comparison between paper-and-pencil tests and computer-based assessment
19
• Electronic tests and gender differences
• Adaptive vs. linear computer-based assessment.
For more, see: http://crell.jrc.ec.europa.eu/WP/workshoptransition.htm
1b. PISA 2009 and 2012 - Electronic Reading Assessment – ERA (optional component)
The PISA 2009 ERA optional component was implemented in recognition of the
increasing prevalence of digital texts in many parts of our lives: personal, social and economic.
Even though the core principles of writing texts and the core processes of reading and
understanding texts are similar across media, there are reasons to believe that the specific
features of digital texts call for specific text-processing skills.
The new demands on reading proficiency created by the digital world have led, in PISA
2009, to the inclusion of electronic reading in the reading framework, an inclusion that has, in
turn, resulted in some redefinition both of texts and of the mental processes that readers use to
comprehend the texts. ERA 2009 was designed to investigate students’ proficiency at tasks that
require the access, comprehension, evaluation and integration of digital texts across a wide range
of reading contexts and tasks among today’s 15-year-olds.
Nineteen countries participated in ERA. The ERA used a test administration system
(TAO) developed through the University of Luxembourg. TAO can deliver tests over the
Internet, across a network or (as was the case with ERA) on a standalone computer with student
responses collected on an external memory (USB).
ERA will be offered again as an optional component in PISA 2012 (a pilot study was
conducted in 2011 in around 30 countries).
1c. PISA 2012/2015 – Computer-Based Problem Solving Assessment (compulsory component)
20
The development of competency in problem solving (PS) is a central objective within the
educational programs of many countries. The acquisition of increased levels of competency in
problem solving provides a basis for future learning, for effective participation in society and for
conducting personal activities. Students need to be able to apply what they have learned to new
situations.
What distinguishes the 2012 assessment of PS from the 2003 PS assessment is not so
much the definition of the competency of problem solving, or the focus on problems that only
require low levels of discipline-based knowledge for their solution, but the mode of delivery
(computer-based) of the 2012 assessment and the inclusion of problems that cannot be solved
without the respondent interacting with the problem online. A pilot study was conducted in 2011
in around 40 countries.
In PISA 2015 a new Collaborative Problem Solving assessment will be added, which will
incorporate online assessment of the skills required to solve problems as a member of a group.
1d. PISA 2012 – Electronic Mathematical Assessment (optional component)
The definition of mathematical literacy in PISA 2012 explicitly calls for the use of
mathematical tools, including technological tools, for judgments and decision making.
Computer-based tools are in common use in workplaces of the 21st century and will be
increasingly more prevalent as the century progresses. The nature of work-related problems and
logical reasoning has expanded with these new opportunities, creating new expectations for
individuals.
Because PISA items reflect problems that arise in personal, occupational, social, and
scientific contexts, a calculator is part of many PISA items. Computers and calculators relieve
21
the burden of computation so that individual respondents' attention can be focused on strategies,
concepts and structures rather than on mechanical procedures. A computer-based assessment will
provide the opportunity to extend the integration of technologies − such as statistical tools,
geometric construction and visualization utilities, and virtual measuring instruments − into the
test items.
A pilot study was conducted in 2011 in around 30 countries.
1e. PISA 2015 – Recent Call for Tender
The plans for PISA 2015 are proceeding on the assumption that computer delivery will be
a significant aspect of PISA 2015. The full extent of computer delivery has yet to be established.
The PISA 2015 call for tenders requires the contractor to develop an electronic platform
that is suitable for the assessment of all PISA domains and is capable of all functions from item
development to delivery of the assessment, i.e., item development, item review, test compilation,
test delivery and administration. It should be capable of being operationalized through an
extensive range of operating systems, including delivery over the Internet, in order to maximize
country participation, and should exploit the possibilities that arise from the use of new
technologies to assess students' knowledge and skills in everyday tasks and challenges, in
keeping with PISA's definition of literacy. It should also be adaptable to allow for evolution over
the PISA cycles, e.g., to assess new domains and to cope with new test designs.
Overall, there is a challenge yet to be resolved in how to move to digital assessments
while keeping the trends from previous P&P PISA assessments.
22
2. Program for the International Assessment of Adult Competencies (PIAAC)
Over the past two decades, national governments and other stakeholders have shown a
growing interest in an international assessment of adult skills that allows them to monitor how
well prepared populations are for the challenges of a knowledge-based society. The OECD’s
PIAAC will be the largest and most innovative international assessment of adult skills ever
conducted (Schleicher, 2008).
The primary objectives of PIAAC are to: 1) identify and measure cognitive competencies
believed to underlie both personal and societal success; 2) assess the impact of these
competencies on social and economic outcomes at individual and aggregate levels; 3) gauge the
performance of education and training systems in generating required competencies; and 4) help
to clarify the policy levers that could contribute to enhancing competencies.
At the core of PIAAC is an assessment of the literacy skills among adult populations,
these being understood as the interest, attitude and ability of individuals to appropriately use
sociocultural tools, including digital technology and communication tools, to access, manage,
integrate, and evaluate information, construct new knowledge, and communicate with others. In
addition, PIAAC collects information from respondents concerning their use of key work skills
in their jobs – a first for an international study.
The skills assessed by PIAAC (literacy, numeracy, and problem solving in technology-
rich environments) represent cognitive skills that provide a foundation for effective and
successful participation in modern societies and economies. Levi (2010) argues that a
technology-rich workplace requires foundational skills including numeracy and literacy (both to
be tested in PIAAC), advanced problem-solving skills or Expert Thinking (similar to the
23
construct of Problem Solving in Technology-Rich Environments to be tested in PIAAC) and
advanced communication skills or Complex Communication (not being tested in PIAAC).
PIAAC will offer a far more complete and nuanced picture of the “human capital” on
which countries can count as they compete in today’s global economy. It will help policymakers
assess the effectiveness of education and training systems, both for recent entrants into the labor
market and for older people who may need to continue learning new skills throughout their
lifetimes.
Twenty-six countries are currently implementing PIAAC. A field test was successfully
conducted in 2010, with the main assessment scheduled for 2011-12. A second round of PIAAC
is planned to allow additional countries to participate in, and benefit from, the assessment. The
assessments will be available in paper- and computer-based formats.
The International Association for the Evaluation of Educational Achievement (IEA)
The IEA has conducted various international surveys regarding the implementation of
ICT in education around the globe. Twenty-two countries participated in the first stage of the
Computers in Education Study and, in 1989, conducted school surveys in elementary, lower
secondary, and upper secondary schools. In 1992 the second stage of the study repeated the
surveys of the first stage and added an assessment of students. The rapid diffusion of the Internet
and multimedia technology during the mid-1990s generated an interest in a new study that,
among other things, could investigate the changes in the curricula and classrooms since IEA's
earlier study.
The Second International Technology in Education Study (SITES) was initiated in 1996
by the IEA and school surveys were conducted in 1998. The SITES study consists of three
24
modules. The survey data of module 1 were collected in 1998. The module 2 case studies that
involved visits to school sites were conducted during 2000 and 2001, and the reports were
released in 2002 and 2003. Module 3 was launched in 2001, but the data for the surveys and
student assessments was collected during 2004, with the results released in 2005 and 2006.
A new IEA computerized international large-scale assessment for students – ICILS –will
be administered in 2013.
International Computer and Information Literacy Study (ICILS) – 2013
ICILS will examine the outcomes of student computer and information literacy (CIL)
across countries.
Computer and information literacy refers to an individual’s ability to use computers to
investigate, create, and communicate in order to participate effectively at home, at school, in the
workplace, and in the community. Twenty countries have registered so far to participate in this
study.
The assessment of CIL will be authentic and computer-based. It will incorporate
multiple-choice and constructed-response items based on realistic stimulus material; software
simulations of generic applications so that students are required to complete an action in
response to an instruction; and authentic tasks that require students to modify and create
information products using “live” computer software applications.
ICILS 2013 will be the first international comparative study of student preparedness to
use computers to investigate, create and communicate at home, at school and in the broader
community. ICILS will assess students’ capacity to:
25
• Use technology to search for, retrieve and make effective judgments about the
quality and usefulness of information from a range of sources (such as the
Internet)
• Use technology to transform and create information
• Use technology to communicate information to others
• Recognize the ethical and legal obligations, responsibilities and potential dangers
associated with digital communication.
ICILS 2013 will provide policymakers with results that will enhance understanding of
factors responsible for achievement in computer-based tasks. It will also inform policy on the
possible contribution of educational systems to the use of computers for digital communication
and information literacy as an essential skill in the 21st century.
A pilot study will be conducted in 2012.
For more, see: http://www.acer.edu.au/icils/
Mega Technology Companies Collaborate to Integrate Technology in ILSAs
Three leading technology companies − Cisco, Intel and Microsoft – collaborated in 2009
with the University of Melbourne with the goal of transforming global educational assessment
and improving learning outcomes − Assessment and Teaching of 21st Century Skills (ATC21S).
The goals of the project are to mobilize international educational, political and business
communities to make the transformation of educational assessment and, hence, instructional
practice a global priority; to specify in measurable terms high-priority understanding and skills
needed by productive and creative workers and citizens of the 21st century; to identify
26
methodological and technological barriers to ICT-based assessment; to develop and pilot new
assessment methodologies; to examine and recommend innovative ICT-enabled, classroom-
based learning environments and formative assessments that support the development of 21st
century skills.
The collaboration initiated and produced five important white papers (Griffin, McGaw &
Care, 2011) in the first phase of this development project:
• Defining 21st century skills
• Perspectives on methodological issues
• Technological issues for computer-based assessment
• New assessments and environments for knowledge building
• Policy frameworks for new assessments
Through 2010-2012 the focus of the project is on development of the assessment
methodologies and dynamic tasks. ATC21S has received the support of major international
assessment organizations, as well as participating governments’ departments of education,
through representation on the ATC21S Advisory Panel.
For more, see: http://atc21s.org/
Conclusion
Neither assessments nor technologies are goals in and of themselves. The merit in both is
only if they make a significant impact on education by improving instruction and increasing
(directly or indirectly) the opportunity for each pupil to learn and progress. However, the more
27
technology becomes integrated in instruction, the more the need is to adjust assessment to
become digital and align with the new instructional environment.
Do all the above new and innovative digital assessments indicate that the educational
world is approaching a turning point regarding the incorporation of technology into large-scale
assessments? Are schools pedagogically, technologically, logistically, and socially prepared for
this development? What are the implications for educators, and for policymakers? What will
make this mega-investment worthwhile?
Thoughtful integration of technology into assessment may meet several complementary
goals:
• Supporting and enhancing the integration of technology into learning.
• Allowing for an assessment of complex cognitive skills (e.g., Diagnoser, SimScientists).
• Designing a new accountability paradigm, that fully integrates sequential formative
assessments and periodic summative assessments via computers (i.e., CBAL).
The present structure and development of electronic ILSAs certainly support the first two
goals – enhancing the integration of technology into learning, and providing an appropriate
response for the measurement of complex cognitive abilities, thus increasing test validity. Also,
beyond measuring 21st century skills and broadening the measured construct, computerized
international assessments can provide a more useful profile of countries whose students are
clustered in particularly high or low levels of performance.
However, new accountability paradigm approaches, proposed by CBAL and the Race to
the Top consortia, according to which formative and summative assessment are fully aligned and
integrated via technology, may not be easily applicable to ILSAs. That is, the ability to integrate
continual formative assessments with periodic summative assessments entails complete
28
alignment of the curricula, the content and standards with the external and internal assessment
tasks. It is not clear whether such an alignment is possible, or even desired, with regard to the
ILSAs.
Additionally, ILSAs take place on a cycle that is well suited for international
comparisons (three, four or five years). However, as ILSAs have become high-stakes and
excelling on them has become a national goal, this cycle causes certain local irregularities in the
organization of learning and NLSAs in years that coincide with ILSA cycles. Thus, there is a
need to better align the national and international assessments in countries where both systems
exist.
In summary, several issues have yet to be resolved before moving NLSAs and ILSAs to
full blown digital and online systems: adaptive versus linear administration of the assessments;
ensuring gender equality; narrowing the digital divide; school system readiness (personnel,
hardware, connectivity); infrastructure adaptation to various languages (including right-to-left
languages) and more.
Also, there is a challenge in how to equate P&P and digital assessments with the goal of
maintaining long-term trends, as the two goals are somewhat incompatible. Changing the nature
of tests generally breaks trends, while keeping tests unchanged is unsatisfactory as the definition
of the skills and competencies to be measured are continually changing. This paradoxical
situation has to be resolved and in order to do so both new technologies and new
conceptualizations of what is being measured have to be explored.
Nevertheless, it is clear that technology will continue to advance and improve both
NLSAs and ILSAs in an evolutionary manner. Technology even has the potential to
revolutionize NLSAs and its alignment with learning and teaching as proposed by the CBAL
29
model. However, the extent to which ILSAs, based on a common international framework that is
often not fully aligned with the national curricula, can join this revolution remains a pedagogical
and strategic challenge.
30
References
Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke-Midura, J., Torres,C., Haertel,
G., Dolan, R., Beddow, P., and Lazarus, S. (2010). Technology-Enabled and Universally
Designed Assessment: Considering Access in Measuring the Achievement of Students
with Disabilities – A Foundation for Research. Journal of Technology, Learning, and
Assessment, 10 (5). Retrieved May 2011 from: http://www.jtla.org
Bejar, I., and Graf, E. A. (2010). Updating the Duplex Design for Test-Based Accountability in
the Twenty-First Century Measurement: Interdisciplinary Research and Perspective, 8
(2,3), 110-129.
Bennett, R. E. (2001). How the Internet will help large-scale assessment reinvent itself.
Education Policy Analysis Archives, 9 (5).
Bennett, R. E. (2002). Inexorable and inevitable: the continuing story of technology and
assessment. The Journal of Technology, Learning and Assessment. Retrieved May 2011
from: http://escholarship.bc.edu/jtla/
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): a
preliminary theory of action for summative and formative assessment. Measurement:
Interdisciplinary Research and Perspectives, 8, 70–91.
Bennett, R.E., Persky, H., Weiss, A.R., and Jenkins, F. (2007). Problem Solving in Technology-
Rich Environments: A Report From the NAEP Technology-Based Assessment Project
(NCES 2007–466). U.S. Department of Education. Washington, DC: National Center for
Education Statistics. Retrieved May 2011 from:
http://nces.ed.gov/nationsreportcard/pdf/studies/2007466.pdf
31
Bennett, R.E., Persky, H., Weiss, A., and Jenkins, F. (2010). Measuring Problem Solving with
Technology: A Demonstration Study for NAEP. Journal of Technology, Learning, and
Assessment, 8(8). Retrieved May 2011 from: http://www.jtla.org
Bennett, R. E., and Gitomer, D. H. (2009). Transforming K–12 assessment. In C. Wyatt-Smith
and J. Cumming (Eds.), Educational assessment in the 21st century. New York: Springer.
Chudowsky, N., Pellegrino, J. W., (2003). Large-scale assessments that support learning: what
will it take? Theory Into Practice 42.1, 75-83
Johnstone, C., Thurlow, M., Altman, J., Timmons, J., and Keto, K. (2009). Assistive Technology
Approaches for Large-Scale Assessment: Perceptions of Teachers of Students with
Visual Impairments. Exceptionality, 17(2), 66-75.
Koretz, D., and Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan
(Ed.), Educational Measurement (4th edition). Westport. CT: American Council on
Education/Praeger (pp 531-578).
Kozma, R. (2009). Assessing and teaching 21st century skills: A call to action . In F.
Schueremann and J. Bjornsson (eds.), The transition to computer-based assessment: New
approaches to skills assessment and implications for large scale assessment. Brussels:
European Communities, (PP 13-23). Retrieved May 2011 from:
http://www.worldclassarena.net/doc/file17.pdf
Griffin, P., McGaw, B., & Care, E. (Eds.) (2011). Assessment and Teaching of 21st Century
Skills. Dordrecht: Springer.
Herman, J., Dai, Y., Htut, A. M., Martinez, M., & Rivera, N. (2011). Evaluation of the enhanced
assessment grants (EAGs) SimScientists program: Site visit findings. (CRESST Report
32
791). Los Angeles, CA: University of California, National Center for Research on
Evaluation, Standards, and Student Testing (CRESST). Retrived November 2011 from:
http://www.cse.ucla.edu/products/reports/R791.pdf
Levi, F. (2010). How Technology Changes Demands for Human Skills. Retrieved May 2011
from OECD iLibrary: http://www.oecd-ilibrary.org/education/how-technology-changes-
demands-for-human-skills_5kmhds6czqzq-en
Linn, R. (2010). A New Era of Test-Based Educational Accountability. Measurement:
Interdisciplinary Research and Perspectives, 8 (2,3), 145 – 149. Retrieved May 2011
from: http://pdfserve.informaworld.com/373233__926993282.pdf
Nichols, S. N. and Berliner, D. C. (2007). Collateral Damage: The effects of high-stakes testing
on America’s schools. Cambridge, MA: Harvard Education Press
OECD. (2010). PISA Computer-Based Assessment of Student Skills in Science. OECD
Publishing.
Quellmalz, E. S., and Pellegrino, J. W. (2009). Technology and Testing. SCIENCE , 323, 75-79.
Quellmalz, E. S., Timms, M. J., and Buckley, B. (2009). Using Science Simulations to Support
Powerful Formative Assessments of Complex Science Learning. WestEd. Retrieved May
2011 from:
http://www.simscientists.org/downloads/Quellmalz_Formative_Assessment.pdf
Quellmallz, E. S., Silberglitt, M. D. & TimmsM. J., (2011). How Can Simulations be
Components of Balanced State Science Assessment Systems? Policy Brief. WestEd.
Retrieved November 2011 from:
http://simscientist.org/downloads/SimScientistsPolicyBrief.pdf
33
Sandene, B., Horkay, N. Bennett, R. E., Allen, N. , Braswell, J., Kaplan, B., and Oranje, A.
(2005). Online Assessment in Mathematics and Writing: Reports From the NAEP
Technology-Based Assessment Project, Research and Development Series. Retrieved
May 2011 from: http://nces.ed.gov/nationsreportcard/pubs/studies/2005457.asp
Schleicher, A. (2008). PIAAC: A New Strategy for Assesing Adult Competencies. International
Review of Education, 54 (5,6), 627-650.
The Assessment and Teaching of 21st-Century Skills (ATC21S). http://atc21s.org/
Thompson, S., Johnstone, C. J., and Thurlow, M. L. (2002). Universal Design Applied to Large
Scale Assessments. Retrieved May 2011 from:
http://www.cehd.umn.edu/NCEO/onlinepubs/synthesis44.html
Turmo, A., and Svein, L. (2006). PISA's Computer-based Assessment of Science- A gender
equity perspective. AEA-E Annual Conference 2006, Assessment and Equity . Naples,
Italy. Retrieved May 2011 from:
http://www.aea-europe.net/userfiles/D1%20Are%20Turmo%20&%20Svein%20Lie.pdf
U.S. Department of Education. (2010). U.S. Secretary of Education Duncan Announces Winners
of Competition to Improve Student Assessments. Retrieved may 2011 from:
http://www.ed.gov/news/press-releases/us-secretary-education-duncan-announces-
winners-competition-improve-student-asse
34
Appendix 1
Kozma (2009) provided an extensive list of the potential advantages and challenges of incorporating ICT into large-scale assessments (exact quote): Advantages:
• Reduced costs of data entry, collection, aggregation, verification, and analysis.
• The ability to adapt tests to individual students, so that the level of difficulty can be
adjusted as the student progresses through the assessment and a more refined profile of
skill can be obtained for each student.
• The ability to efficiently collect and score responses, including the collection and
automated or semi-automated scoring of more sophisticated responses, such as extended,
open-ended text responses.
• The ability to collect data on students’ intermediate products, strategies and indicators of
thought processes during an assessment task, in addition to the student’s final answer.
• The ability to take advantage of ICT tools that are now integral to the practice and
understanding of subject domains, such as the use of idea organizers for writing, data
analysis tools in social science, and visualization and modeling tools in natural science.
• The ability to provide curriculum developers, researchers, teachers, and even students with
detailed information that can be used to improve future learning.
Technological challenges: Among the technological challenges that might inhibit the use of ICT-based assessments are:
• Significant startup costs for assessment systems that have previously implemented only
paper-and-pencil assessments. These costs would include hardware, software, and network
35
purchases; software development related to localization; and technical support and
maintenance.
• The need to choose between the use of “native” applications that would not allow for
standardization but would allow students use the applications with which they are most
familiar, the use of standardized off-the-shelf applications that would provide
standardization but may disadvantage some students that regularly use a different
application, or the use of specially developed “generic” applications that provide
standardization but disadvantage everyone equally.
• The need to integrate applications and systems so that standardized information can be
collected and aggregated.
• The need to choose between standalone implementation versus Internet-based
implementation. If standalone, the costs of assuring standardization and reliable operation,
as well as the costs of aggregating data. If Internet-based, the need to choose between
running applications locally or having everything browser-based.
• If the assessment is Internet-based, issues of scale need to be addressed, such as the
potentially disabling congestion for both local networks and back-end servers as large
numbers of students take the assessment simultaneously.
• Issues of security are also significant with Internet-based assessments.
• The need to handle a wide variety of languages, orthographies, and symbol systems for
both the delivery of the task material and for collection and scoring of open-ended
responses.
• The need to keep up with rapidly changing technologies and maintaining comparability of
results, over time.
36
• The need for tools to make the design of assessment tasks easy and efficient.
• The lack of knowledge of technological innovators about assessment, and the
corresponding paucity of examples of educational software that incorporates with high-
quality assessments.
Significant methodological challenges include:
• The need to determine the extent to which ICT-based items that measure subject
knowledge should be equivalent to legacy paper-and-pencil-based results.
• The need to detail the wider range of skills that can only be assessed with ICT.
• The need to determine the age-level appropriateness of various 21st century skills.
• The need to design complex, compound tasks in a way such that failure on one task
component does not cascade through the remaining components of the task or result in
student termination.
• The need to integrate foundational ideas of subject knowledge along with 21st century
skills in the assessments. At the same time, there is a need to determine the extent to
which subject knowledge should be distinguished from 21st century skills in assessment
results.
• The need to incorporate qualities of high-level professional judgments about student
performances into ICT assessments, as well as support the efficiency and reliability of
these judgments.
• The need to develop new theories and models of scoring the students’ processes and
strategies during assessments, as well as outcomes.
• The need to establish the predictive ability of these judgments on the quality of subsequent
performance in advanced study and work.
37
• The need to distinguish individual contributions and skills on tasks that are done
collaboratively.
For more, see: http://www.worldclassarena.net/doc/file17.pdf