comparison of the performance of ontario students on … · comparison of the performance of...
TRANSCRIPT
Comparison of the Performance of Ontario Students on the OSSLT/TPCL and the PISA 2009 Reading Assessment
Nizam Radwan, Ph.D., and Yunmei Xu, Ph.D.for the Education Quality and Accountability Office
J U LY 2 0 1 2
About the Education Quality and Accountability Office
The Education Quality and Accountability Office (EQAO) is an independent provincial agency funded by the Government of Ontario. EQAO’s mandate is to conduct province-wide tests at key points in every student’s primary, junior and secondary education and report the results to educators, parents and the public.
EQAO acts as a catalyst for increasing the success of Ontario students by measuring their achievement in reading, writing and mathematics in relation to Ontario Curriculum expectations. The resulting data provide a gauge of quality and accountability in the Ontario education system.
The objective and reliable assessment results are evidence that adds to current knowledge about student learning and serves as an important tool for improvement at all levels: for individual students, schools, boards and the province.
About EQAO Research
EQAO undertakes research for two main purposes:
• to maintain best-of-class practices and to ensure that the agency remains at the forefront of large-scale assessment and
• to promote the use of EQAO data for improved student achievement through the investigation of means to inform policy directions and decisions made by educators, parents and the government.
EQAO research projects delve into the factors that influence student achievement and education quality, and examine the statistical and psychometric processes that result in high-quality assessment data.
Education Quality and Accountability Office, 2 Carlton Street, Suite 1200, Toronto ON M5B 2M9, 1-888-327-7377, www.eqao.com
© 2012 Queen’s Printer for Ontario
1
Introduction
With Ontario students’ continued participation in various provincial, national and
international assessments, there is an interest in comparing the performance of students on the
EQAO assessments with their performance on national and international assessments. Parents,
educators and policy makers want to know how well students who were successful on provincial
tests performed on national or international assessments. However, even though these
assessments measure the same content area (e.g., reading), comparisons are not straightforward
because results of different assessments are not reported on a common scale.
The purpose of this study is to explore the possibility of linking the Ontario Secondary
School Literacy Test/Test provincial de compétences linguistiques (OSSLT/TPCL) and the
Programme for International Student Assessment (PISA) 2009 reading assessment. In 2009,
PISA and the OSSLT/TPCL were administered between April and May. In the case of the
OSSLT/TPCL, all eligible students in the province were assessed, and among these students, a
probability sample of 15-year-olds took the PISA reading assessment. The fact that a large
number of students took both tests in such short time frame provided a strong basis for linking
the scores on the two tests. This linking could provide valuable information for education policy
makers and stakeholders about the Ontario provincial standard relative to international
benchmarks. This study used two different methods to link the two tests’ score distributions: the
Fixed Item Parameter (FIP) linking procedure implemented through an Item Response Theory
(IRT) model and the equipercentile with pre-smoothing procedure (Kolen & Brennan, 2004).
Successful linking will not only give useful information about educational policy, but will also
add to the psychometric literature about scaling two tests that are constructed from two different
tables of specifications but within the same substantive area.
Background
OSSLT/TPCL
The OSSLT/TPCL is a provincial test of minimum-competency literacy skills, where
literacy is defined in terms of reading and writing expectations in The Ontario Curriculum up to
2
the end of Grade 9. There are two language forms, one for English-language students and one for
French-language students. Both forms are administered annually to Grade 10 students attending
public and private schools in Ontario. A successful outcome on the OSSLT/TPCL is one of the
provincial high school graduation requirements. Students who write the OSSLT/TPCL once and
are unsuccessful may retake the test in a subsequent year or enroll in and successfully complete
the Ontario Secondary School Literacy Course (OSSLC).
Student responses to the OSSLT/TPCL are analyzed using a modified one-parameter IRT
model with a fixed a- and c-parameter for multiple-choice items and the generalized partial
credit model for open-response items. The results from one year to the next are equated using the
Fixed Common Item Parameter approach.
PISA
The Organization for Economic Co-operation and Development (OECD) initiated the
PISA assessments in response to a strong international interest in having a comparative
international assessment to measure how well students at age 15 are prepared academically to
meet the challenges of the future. This age group was selected because students are approaching
the end of compulsory education. The assessment is administered every three years and assesses
three subjects: reading, mathematics and science. In every administration, one subject area is
emphasized: two-thirds of the testing time is devoted to that subject and the remaining third is
devoted to the other two subjects. The 2000 administration emphasized reading literacy, 2003
emphasized mathematics, and 2006 emphasized science. In 2009, PISA once again emphasized
reading literacy, but the assessment was expanded to include reading and understanding
electronic texts in order to reflect the importance of computer technology in education. Ontario
did not administer this component of the assessment. Between 4 500 and 10 000 students in each
of more than 60 OECD member and partner countries participated.
PISA is constructed using a common, internationally-agreed-upon framework (OECD,
2009). PISA also employs a matrix sampling approach to achieve larger content coverage
without requiring students to write very long tests. All test items are divided into a set of item
blocks. These blocks are carefully arranged into booklets, each having an equal number of items
(and requiring equal testing time), with balanced content coverage and item format. Each student
is then randomly assigned one of the booklets. As a result, individual students answer a
3
manageable number of items and the curriculum is broadly covered at the aggregate level (Childs
& Jaciw, 2003).
Like the OSSLT/TPCL, PISA uses a one-parameter IRT model, which produces
comparable results for all participating countries. However, PISA uses the plausible-value
method (Mislevy, 1991; Mislevy, Johnson & Muraki, 1992) to calculate student proficiency
estimates. Plausible values are random draws from the posterior distribution of the latent
proficiency variable and are estimated using item and background information. This approach
allows improved estimates at the group level (Mislevy, 1991; Mislevy, Johnson & Muraki,
1992). Group-level results that are reported include the mean, standard deviation, and various
percentiles at the aggregated level. PISA does not report individual students’ results.
Both the OSSLT/TPCL and PISA meet high standards with respect to test design, test
content and psychometric quality. They also test a similar grade/age of students and content area.
However, the two tests differ in three ways. First, the two assessments were constructed using
different test frameworks. Second, the intended uses of the two assessments are different. PISA
is a low-stakes test for students: it has no direct consequences for them because results are not
reported to students. The OSSLT/TPCL is a high-stakes test, because results are reported to
students and the results affect their graduation status. Third, the two assessments differ in their
design and the reporting of results. PISA uses a matrix sampling design while OSSLT/TPCL has
a single test form, and the results for the two tests are reported on different scales. These
differences between PISA and the OSSLT/TPCL may pose significant challenges to linking
them.
Linking approaches
Test linking is the process by which the results from one test are used to predict the
results of another test (Linn, McLaughlin & Thissen, 2009). Linn (1993) and Mislevy (1992)
identified four types of linking; these types (ordered from the strongest to the weakest) are:
equating, calibration, projection and moderation. Equating is the strongest form of linking.
Equating is possible if the tests to be linked are equivalent in content, format, purpose,
administration, item difficulty and populations (Linn, 1993 and Mislevy, 1992). In calibration,
tests that are constructed for different purposes and from different content specifications, but
cover the same substantive area (e.g., reading), are linked by placing the scores from the two
4
tests on the same scale. In projection, scores from one test are used to predict scores on another
test that may cover different content. Moderation is the weakest form of linking. It is used on
tests developed from different blueprints and administered to nonequivalent populations. Clearly,
the type of linking used depends upon the degree of comparability of the tests’ content and
psychometric properties.
Earlier research has empirically demonstrated procedures for linking assessments. In the
United States, many studies have linked scores from state-level assessments to those from
national and international assessments. Linear regression analyses or equipercentile procedures
were commonly used in these studies. Pashley and Phillips (1993) investigated linking between
the International Assessment of Educational Progress (IAEP) and the National Assessment of
Educational Progress (NAEP), which were built to different content specifications. They worked
with sample data from 1609 students who took both assessments. They determined a linear
relationship between IAEP and NAEP proficiency estimates and used it to estimate the
percentages of students from the IAEP who could perform at or above the three performance
levels established for NAEP. Pashley and Phillips concluded that it is possible to establish an
accurate statistical link between the IAEP and NAEP, but they warned that the linking results
should be interpreted with caution because it is difficult to measure the effects of unexplored
sources of non-statistical error on score results, such as different motivation levels of students,
which could affect test scores. Linn and Kiplinger (1994) used equipercentile procedures to link
data from state tests and NAEP. They found that the linking function could estimate average
state performance on NAEP, but was not accurate for scores at the top or bottom of the scale.
Also, the linking function was different for male and female subgroups, which indicated that the
linking function is not invariant across different subgroups.
Waltman (1997) used equipercentile procedures to link the Iowa Tests of Basic Skills
(ITBS) and NAEP. The two assessments were also linked using a social moderation approach in
which the achievement level descriptions used for NAEP were used by judges to set performance
standards (basic, proficient and advanced) on the ITBS. The results of the study showed that for
students who took both assessments, the corresponding achievement regions on the NAEP and
ITBS scales produced low to moderate percents of agreement in student classification.
Agreement was particularly low for students at the advanced level; two-thirds or more were
classified differently.
5
In Canada, Cartwright (2003) investigated four different linking procedures
(equipercentile with the 4-parameter beta distribution, Gaussian kernel smoothing, and finite and
variable Gaussian mixture models) to link the reading component of the Foundation Skills
Assessment (FSA) administered to Grade 10 students in British Columbia with the PISA reading
literacy assessment, both administered in 20001. Cartwright examined the bias and variability of
the linked scores and the accuracy of the estimated standard errors of linking. He concluded that
the equipercentile method was the most appropriate method for linking the two tests. The results
of this study indicated that it is feasible to provide valid linking results between a regional test
(FSA) and an international test (PISA)2.
Purpose
The purpose of this study was to investigate whether linking the OSSLT/TPCL with the 2009
PISA reading assessment is feasible. Two linking procedures were explored: the Fixed Item
Parameter (FIP) procedure, which is similar to the equating procedure currently used by EQAO
to equate its operational assessments from year to year, and the equipercentile procedure. The
study addressed the following five research questions:
1. How do successful/unsuccessful students on the OSSLT/TPCL tend to be classified on PISA?
2. What is the strength of the relationship between student performance on PISA and on the
entire OSSLT/TPCL test, the OSSLT/TPCL reading sub-test and the OSSLT/TPCL writing
sub-test?
3. How comparable are the results obtained using the equipercentile procedure and the results
obtained using the FIP procedure?
4. How accurate is the linkage between the two tests? Is the linking function similar for both
male and female students?
5. How well does the performance of students on the OSSLT/TPCL predict their performance on
PISA?
1 FSA currently is an annual provincial assessment program at Grades 4 and 7, which did include assessments at Grade 10 at the time of the study. It measures students’ academic skills in reading, writing and numeracy.
2 A second more recent study (2008) was completed in British Columbia in which student performance on the 2008 Grade 4 FSA reading test was linked to student performance on the 2006 Progress in International Literacy Study (PIRLS) test, but there is very little information available on this study. Linking procedures similar to those used by Cartwright (2003) were employed and the linking was considered to be successful.
6
Method
Data
A common student identification number across the two assessments was used to create a
combined data set for this study, which contained students’ responses to the 2009 PISA reading
items and the 2009 OSSLT/TPCL items. The combined data set has 3726 Ontario students, with
2450 students from English-language schools and 1276 students from French-language schools.
The data set also contained outcome variables; various background variables such as gender,
language, English language learner, or special education recipient; and a variable indicating
which PISA booklet the student used.
Test items
For PISA, each student responded to about 28 items, which, due to PISA’s use of matrix
sampling, is a portion of the 101 test items—47 multiple-choice and 54 open-response items. For
the OSSLT/TPCL, students responded to all of the 47 test items—39 multiple-choice and 8
open-response items (Table 1).
Table 1 Number of Test Items in PISA and the OSSLT/TPCL
Assessment Multiple-choice items Open-response items
PISA reading 2009
(across all booklets) 47 54
OSSLT/TPCL (reading and
writing)
31 (reading)
8 (writing)
4 (reading)
2 (short writing)*
2 (long writing)*
*Each writing prompt was scored for topic development and conventions, which resulted in 8 scores.
Outcome variables
For the OSSLT/TPCL, the outcome variable for each student is a score reported on a
scale from 200 to 400. A score of 300 is the minimum required for a successful result. For PISA,
the outcome variable for each student is indicated by five plausible values reported on a scale
7
from 200 to 800, with a mean of 500 and a standard deviation of 100. PISA calculates five
plausible values for each student because, as mentioned previously, PISA uses a matrix sampling
design. Since proficiency is measured with a subset of the total item pool, each individual
proficiency estimate has a substantial amount of measurement error. Using multiple values
accounts for the uncertainty associated with a student’s estimated score because multiple values
represent the likely distribution of a student’s proficiency (OECD, 2005).
Analysis
Content alignment
To strengthen the interpretability of the linking study results, content experts at EQAO
reviewed the OSSLT/TPCL 2009 and the PISA 2009 reading assessment to see how similar the
content of the two assessments were. If they were highly similar, then the scores from the linking
process could be treated as equated scale scores. If the two tests were only moderately similar,
then the scores from the linking process could be treated as comparable scale scores (American
Education Research Association, American Psychological Association, & National Council on
Education, 1999).
Descriptive statistics
The following descriptive statistics were generated:
1. the mean and standard deviation of the PISA reading and the OSSLT/TPCL scale scores;
2. the Pearson product moment correlation coefficients between the OSSLT/TPCL total test,
OSSLT/TPCL reading and OSSLT/TPCL writing and PISA reading; and
3. the distribution of successful and unsuccessful students on the OSSLT/TPCL across
PISA proficiency levels. For the OSSLT/TPCL, students with scale scores of 300 and
above were classified as successful and those below 300 were classified as unsuccessful.
For PISA, students were classified into five proficiency levels based on the cut scores
published for the 2009 PISA reading assessment (OECD, 2010): Below Level 2, Level 2,
Level 3, Level 4 and Level 5 and Above.
8
Discriminant function analysis
Discriminant function analysis was conducted on each of the PISA five plausible values
to examine how well successful and unsuccessful students on the OSSLT/TPCL tend to be
classified by PISA scores. A cross tabulation of the original OSSLT/TPCL outcome and the
outcome based on the discriminant analysis was determined. The percentage of misclassified
students and the total percentage of error were calculated.
Linking procedures
The equipercentile procedure and the fixed item parameter (FIP) procedure were used to
link the OSSLT/TPCL and PISA assessments. OECD (2005) suggests conducting an analysis
using all five plausible values to increase the accuracy of results, therefore, the following linking
analyses used each of the five plausible values for both English- and French-Language groups.
Equipercentile procedure
When the equipercentile procedure is used, the distributions to be linked are often
irregular, with sharp “mountains” and “valleys.” The reason for the irregular shapes is that while
the underlying construct being measured is continuous, the scores reflecting possession of the
construct are discrete. Also, not all score points exist, leading to gaps in the distributions.
Therefore, to improve the accuracy of equipercentile linking and to reduce the linking error, a
pre-smoothing method is often used. Among the several pre-smoothing methods, Cope and
Kolen (1990) and Hanson (1990) suggested that the 4-parameter beta binomial model and the
log-linear model could be used to pre-smooth the score distributions to be linked. As mentioned
previously, Cartwright (2003) found that the 4-parameter beta binomial model provided the best
fit among the four different pre-smoothing methods he investigated. The following steps were
followed to link the OSSLT/TPCL and PISA for each of the five plausible values and for each of
the two language groups:
1. The 4-parameter beta model was used to pre-smooth both the PISA and OSSLT/TPCL
distributions.
2. Equipercentile linking was conducted to link the OSSLT/TPCL and PISA.
3. The OSSLT/TPCL cut scores were converted to PISA scale scores and the performance
standards for PISA and the OSSLT/TPCL were compared by identifying the scale score
9
on the PISA scale that corresponded to the cut point for a successful outcome on the
OSSLT/TPCL.
FIP Procedure
The FIP linking procedure involves fixing the parameter estimates for the items on one
test and then calibrating the second test with the first test. This will place the second test onto the
scale of the first test. The FIP linking procedure was carried out to place the OSSLT/TPCL and
PISA on the same scale. The following steps were conducted:
1. An IRT calibration was conducted on the combined data set with PISA item parameters
fixed3. This step produced a set of OSSLT/TPCL item parameter estimates (including
both reading and writing) on the PISA scale. The IRT model used in this step was the
modified one-parameter model (the a-parameter was fixed to 0.588 and the c-parameter
was fixed to 0.20) used by EQAO to calibrate the OSSLT/TPCL. The generalized partial
credit model was used for the polytomous items.
2. The rescaled OSSLT/TPCL item parameter estimates from Step 1 were used to score the
OSSLT/TPCL data. This step produced a rescaled OSSLT/TPCL θ-value on the PISA
scale for each student.
3. The θ-distributions and the IRT test information functions (TIFs) of the rescaled
OSSLT/TPCL and the PISA tests were compared. Similarity/dissimilarity of the θ-
distributions and TIFs provided information on the accuracy of the linking function.
4. The PISA scale scores equivalent to the OSSLT/TPCL cut scores were identified by
identifying scores on the rescaled OSSLT/TPCL θ-values that had the same percentile
ranks as the cut score on the original OSSLT/TPCL scale. Then, each student was
classified into OSSLT/TPCL categories by applying the new cut score to the PISA scale
score. The agreements between the percentages of students assigned to proficiency
categories were determined.
3 PISA publishes item parameters in technical reports. Since the PISA 2009 technical report has not been released, OECD shared the PISA 2009 item parameters with EQAO through email before they became publically available.
10
Regression
For each of the plausible values, a simple regression was conducted to predict the PISA
scale scores from the OSSLT/TPCL scale scores. The variance in the dependent variable
explained by the independent variable was computed. The predicted PISA scale scores were
transformed to PISA levels using the original PISA cut scores. The predicted PISA levels were
then cross-tabulated with the original PISA scores and the percentage of agreement was
calculated.
Results
Content Alignment
Content experts at EQAO examined the content frameworks and reading items for PISA
and the OSSLT/TPCL. Since not all of the PISA items were published, the alignment study was
based on the sample of PISA items available to the public. The results of the content alignment
review are summarized in Table 2. While both the PISA reading assessment and OSSLT/TPCL
are literacy tests, they are not identical. However, the two assessments are strongly similar in
purpose, types of items, text forms used, types of tasks and cognitive skills assessed.
11
Table 2 Content Comparison between PISA Reading Assessment and the OSSLT/TPCL Characteristic PISA reading assessment OSSLT/TPCL
Scope Reading Reading and writing
Method 120 min paper- &-pencil tests [Canada chose not to take the additional 40-min assessment of understanding electronic texts]
47 MC items (46.5%) 54 OR items (53.5%)
150 min paper-&-pencil tests [approximately 75 min devoted to reading components; 75 min to writing components]
31 MC reading items (71.6%); 4 OR reading items (28.3%)
Approximate tasks per text breakdown
70% devoted to continuous texts 30% devoted to non-continuous texts
92% devoted to continuous texts 8% devoted to graphic text
Reading processes: (aspect/skills) assessed & approximate distribution of tasks by aspect/skill
task distribution per “aspect” assessed: 25% on explicitly stated information
& ideas (access and retrieve) 50% on implicitly stated information
and ideas (integrate & interpret) 25% on making connections between
information and ideas in reading selections and personal knowledge and experience (reflect & evaluate)
task distribution per “skills” assessed: 20% on understanding explicitly stated
information & ideas (access and retrieve)
60% on understanding implicitly stated information and ideas (interpret)
20% on making connections between information and ideas in reading selections and personal knowledge and experience (interpret & integrate)
Descriptive Statistics
The mean and standard deviation of the PISA reading and the OSSLT/TPCL scale scores
for the Ontario students who participated in both tests are summarized in Table 3 by language.
As noted previously, PISA and the OSSLT/TPCL were reported on different scales: PISA was
reported on a 200 to 800 scale with a mean of 500 and a standard deviation of 100 whereas
OSSLT/TPCL was reported on a 200 to 400 scale. The means of the OSSLT/TPCL scores are
similar for the two language groups, but the mean of the PISA scores for the English-language
students is 58 points higher than that for the French-language students.
12
Table 3 Mean and Standard Deviation of the Five PISA Plausible Values and OSSLT/TPCL Scores for Students who Completed both Tests
Scores English French
N Mean S.D. N Mean S.D.
OSSLT/TPCL
2450
330 25
1276
327 27
PV1* 535 85 477 86
PV2 536 85 476 86
PV3 535 84 476 87
PV4 535 85 477 86
PV5 536 86 477 85
*PV=plausible value
The Pearson product-moment correlation coefficients between the PISA reading scores
and the OSSLT/TPCL scores ranged from 0.68 – 0.70 for the total test. The coefficients for the
OSSLT/TPCL reading scores (0.63 – 0.66) were slightly higher than those for the OSSLT/TPCL
writing scores (0.60 – 0.63) (Table 4).
Table 4 Correlation Coefficients between the OSSLT/TPCL and the PISA Reading
Language
PISA plausible
value
OSSLT/TPCL
Total test Reading Writing
English
1 0.70 0.66 0.61
2 0.69 0.65 0.60
3 0.69 0.64 0.61
4 0.70 0.65 0.61
5 0.70 0.66 0.60
French
1 0.69 0.65 0.61
2 0.69 0.64 0.63
3 0.70 0.65 0.63
4 0.70 0.65 0.63
5 0.68 0.63 0.61
13
Figure 1 shows the percentage of Ontario students in each of the PISA performance
categories for those students who wrote both tests. A higher percentage of English-language
students than French-language students achieved Level 2 and above.
Figure 1. Percentage of Ontario French- and English-language students in the PISA 2009
proficiency levels
Most of the successful students on the OSSLT/TPCL in both languages were classified at
Level 2 and above in the PISA proficiency levels (Figure 2).
21%
6%
30%
18%
31%
31%
15%
30%
3%
14%
0% 20% 40% 60% 80% 100%
French (n=1,276)
English (n=2,450)
Language Below Level 2
Level 2
Level 3
Level 4
Level 5 and above
% of students at each level (PISA 2009)Below Level 2 to Level 6
14
Figure 2. Distribution of students across the PISA proficiency levels for students who were
successful on the OSSLT/TPCL
Most of the students who were successful on the OSSLT/TPCL were classified at Level 2 or
above on PISA (97% for English-language students and 87% for French-language students)
(Table 5). Of the unsuccessful students, 65% of the English-language students and 30% of the
French-language students were classified at Level 2 or above on PISA.
Table 5 Distribution of Students across PISA Levels within OSSLT/TPCL Outcome Categories
PISA levels
Percentage of students within OSSLT/TPCL outcome
English language French language
Successful Unsuccessful Successful Unsuccessful
Below level 2 3 35 13 70
Level 2 15 48 31 26
Level 3 and above 82 17 56 4
Total 100 100 100 100
3
15
33 34
1513
31
36
18
3
0
5
10
15
20
25
30
35
40
Below Level 2 Level 2 Level 3 Level 4 Level 5 and above
Percentage
PISA Levels
English (Successful)n=2,214
French (Successful)n=1,099
15
Discriminant Function Analysis
A discriminant function analysis was conducted to find out how well PISA scores
classify successful and unsuccessful students on the OSSLT. Table 6 shows the percentage of
students who were misclassified. The average error percentage is calculated as the sum of the
weighted misclassifications times the prior probability. In this case, since there are only two
outcomes — successful and unsuccessful — the prior probability was set at 50%. Since the
average error percentage is about 20%, this means that about 80% of the OSSLT/TPCL students
were classified correctly by PISA scores.
Table 6 Percentage of Students Misclassified on the OSSLT/TPCL and Average Error of
Misclassification
Language
Percentage of misclassified students
Plausible value
Misclassified as successful
Misclassified as unsuccessful
Average error
English
1 17.4 21.6 19.5
2 16.5 21.8 19.2
3 15.7 21.9 18.8
4 14.4 21.5 18.0
5 17.0 22.2 19.6
French
1 22.0 22.7 22.4
2 19.2 22.6 20.9
3 17.5 22.4 20.0
4 18.1 21.7 19.9
5 19.2 23.0 21.1
A discriminate function analysis was also conducted to classify students into the PISA
proficiency levels based on their OSSLT/TPCL scores. Approximately 45% of students were
correctly classified into their observed PISA level based on their OSSLT/TPCL score, with the
percentages being considerably higher for Levels 1 and 6. More than three-quarters of students
were correctly classified as being at or above Level 2.
16
Linking Results
Equipercentile procedure
After linking the OSSLT/TPCL to the PISA using the equipercentile procedure, the
OSSLT/TPCL cut score was converted to a PISA scale score. The cut-score equivalents are
reported in Table 7 for each of the five plausible values for both languages and across gender.
Within each language group and gender, consistent cut scores were identified over the five
plausible values. For the English-language test, the OSSLT cut score of 300 is equivalent to 433
on PISA; whereas, for the French-language test, the TPCL cut score of 300 is equivalent to 388
on PISA, which is 45 points below that for the English-language test.
For the English-language test, the OSSLT cut score of 300 is equivalent to 422 and 446
on PISA scale for males and females, respectively. For the French-language test, the TPCL cut
score of 300 is equivalent to 387 and 389 on PISA scale for males and females, respectively. The
average cut score for French-language males is 35 points below that for English-language males
and the average cut score for the French-language females is 57 points below that for the
English-language females.
Table 7 PISA Reading Scale Score Equivalent to OSSLT/TPCL Cut Score
Language Students Plausible value 1
Plausible value 2
Plausible value 3
Plausible value 4
Plausible value 5
English
All 433 433 434 433 433
Male 421 423 423 422 422
Female 446 446 447 446 445
French
All 388 388 388 388 390
Male 388 388 386 386 390
Female 387 388 391 390 389
Comparing the OSSLT/TPCL and PISA standards using the PISA reading scale, the
OSSLT cut score is slightly above the PISA cut score for Level 2, and the TPCL cut point is
slightly below the PISA cut point for Level 2 (Figure 3). The results for English- and French-
language students suggest that students classified as meeting the provincial standard on the
17
OSSLT/TPCL are somewhat equivalent to PISA Level 2 and above. This means that students
who met the Ontario provincial literacy standards would likely have demonstrated, according to
OECD (2010) “the reading literacy competencies that will enable them to participate effectively
and productively in life.”
Figure 3. Comparison of OSSLT/TPCL and PISA standards for both languages: Equipercentile
equating
Descriptions of the OSSLT/TPCL successful standard and the PISA standard for
Level 2 are quite similar. On the OSSLT/TPCL, the standard for success is defined as
students having the minimum “reading and writing skills required to understand reading
selections and communicate through a variety of written forms as expected in The
Ontario Curriculum across all subjects up to the end of Grade 9” (Education Quality and
PISA Reading
Scale
PISA Reading Proficiency Levels
200
Level 4 (553 - 625)
Level 3 (480 - 552)
Level 2 (407 - 479)
400
Below Level 2 (Below 407)
700
600
500
300
Level 5 and Above (Above 625)
TTPCL (388)
OSSLT (433)
18
Accountability Office, 2007 (December), p. 10). The PISA level 2 standard states “Level
2 is considered a baseline level of proficiency, at which students begin to demonstrate the
reading literacy competencies that will enable them to participate effectively and
productively in life” (OECD, 2010). This similarity is consistent with the results shown
in Figure 3.
FIP procedure
The FIP procedure was conducted to link PISA reading and the OSSLT/TPCL using the
modified one-parameter IRT model for multiple-choice items and the generalized partial credit
model for the open response items. The results show that for English-language students, the
OSSLT cut score is equivalent to 427 on the PISA scale, which is above the lower boundary of
PISA reading proficiency Level 2. For French-language students, the TPCL cut score is
equivalent to 396 on PISA scale, which is slightly below the lower boundary of PISA reading
proficiency Level 2. The cut score for the French-language TPCL is 31 points below the English-
language test cut score.
The cut score was also determined for males and females. For the English-language test,
the OSSLT cut score of 300 is equivalent to PISA scores 423 and 431 for males and females,
respectively. For the French-language test, the TPCL cut score of 300 is equivalent to PISA
scores 394 and 403 for males and females, respectively. The cut score for the French-language
TPCL for males is 29 points below the English-language test cut score. The cut score for the
French-language TPCL for females is 28 points below that for the English-language test. These
differences between male and female cut scores are due to the different formulas used for each
gender to transform the θ-values to PISA scale score.
A comparison of the equipercentile and FIP procedures reveals differences in their results
(Table 8). The PISA scale score equivalent to the OSSLT/TPCL cut score for all French-
language students is 45 points below that for all English-language students for the equipercentile
procedure. But this difference is only 31 points for the FIP procedure. The PISA scale score
equivalent to the OSSLT/TPCL cut score for French-language male and female students is 35
and 57 points below that of the English-language students, respectively in the equipercentile
procedure, but these differences are only 29 and 28 points, respectively in the FIP procedure.
Given the smaller differences between the cut scores for males and females in the FIP procedure,
19
the FIP procedure seems more effective than the equipercentile procedure for these two
assessments.
Table 8 PISA Reading Scale Score Equivalent to OSSLT/TPCL Cut Score across Language and Gender for the Equipercentile and FIP Linking
Students
English language French language
Equipercentile FIP Equipercentile FIP
All 433 427 388 396
Males 422 423 387 394
Females 446 431 389 403
Regression
A simple regression was conducted to predict the PISA scale scores from the
OSSLT/TPCL scale score for each of the five plausible values in the sample. Values deemed to
be outliers were removed from the regression analysis. The agreement between the original
classification and the predicted classification in the sample was tabulated for each of the
plausible values. Tables 9 and 10 show the cross-tabulation of the observed and predicted PISA
proficiency levels for plausible-value one for the English- and French-language samples,
respectively. The agreement between the observed and predicted levels was not high; the
observed and predicted PISA levels were the same for only about half of the students. Also, a
higher degree of agreement tended to be obtained for the middle levels, highlighting the
tendency for regression to underestimate the number of students at the extremes of the scale. The
correlation coefficients between the observed and the predicted PISA plausible values for the
English- and French-language samples ranged from 0.71 to 0.72 and 0.69 to 0.72, respectively.
20
Table 9 Agreement between the Observed and Predicted PISA Levels for the English-Language sample using PISA Plausible Value 1
PISA plausible value 1
Predicted levels
Below level 2
Level 2 Level 3 Level 4 Level 5
and above
Total
Observed levels
Below level 2 N 28 91 50 2 0 171
% 16.4 53.2 29.2 1.2 0.0 100.0
Level 2 N 8 188 209 31 0 436
% 1.8 43.1 47.9 7.1 0.0 100.0
Level 3 N 3 115 465 171 8 762
% 0.4 15.1 61.0 22.4 1.0 100.0
Level 4 N 0 18 289 346 47 700
% 0.0 2.6 41.3 49.4 6.7 100.0
Level 5 and above
N 0 0 55 211 94 360
% 0.0 0.0 15.3 58.6 26.1 100.0
Total N 39 412 1068 761 149 2429
% 1.6 17.0 44.0 31.3 6.1 100.0
21
Table 10 Agreement between the Observed and Predicted PISA Levels for the French-Language sample using PISA Plausible Value 1
PISA plausible value 1
Predicted levels
Below level 2
Level 2 Level 3 Level 4 Level 5
and above
Total
Observed levels
Below level 2
N 119 121 20 1 0 261
% 45.6 46.4 7.7 0.4 0.0 100.0
Level 2 N 49 213 109 10 0 381
% 12.9 55.9 28.6 2.6 0.0 100.0
Level 3 N 9 117 221 32 2 381
% 2.4 30.7 58.0 8.4 0.5 100.0
Level 4 N 0 27 125 48 6 206
% 0.0 13.1 60.7 23.3 2.9 100.0
Level 5 and above
N 0 0 17 20 4 41
% 0.0 0.0 41.5 48.8 9.8 100.0
Total N 177 478 492 111 12 1270
% 13.9 37.6 38.7 8.7 0.9 100.0
Discussion
The results showed that most of the Ontario students who met the provincial standard on
the OSSLT/TPCL (successful students) were classified at PISA Level 2 or above when the two
tests were linked using the equipercentile and FIP procedures. This finding is consistent with the
similarity in definitions of literacy for the two assessments at these cut points. As indicated
earlier, PISA Level 2 is considered a baseline level of proficiency that is similar to the literacy
standard expected for a successful outcome on the OSSLT/TPCL. The correlation coefficients
between PISA scores and the total-test scores for OSSLT/TPCL were slightly higher than those
between PISA and the reading component of the OSSLT/TPCL, and the correlation coefficients
for the reading component were slightly higher than those for the writing component. This
22
finding supports the decision to conduct linking analyses between PISA and the total
OSSLT/TPCL.
Although the equipercentile and the FIP procedures have the same purpose (to determine
the relationship between the scores on two tests that measure the same construct and to identify
the scores on one test that are equivalent to the scores on the other test), each method achieves
this goal in a different way. The equipercentile procedure identifies scores on one test that have
the same percentile ranks as the scores on another test. The equated distribution of scores is
based on equal percentile ranks from the two tests. The FIP procedure places the parameter
estimates for items on the two tests and on the same scale. The results for these two procedures
were similar but not identical. If the purpose is to predict students’ scores on PISA from their
OSSLT/TPCL scores, then the FIP procedure provides more accurate interpretations than the
equipercentile procedure because in the FIP, the OSSLT/TPCL and PISA scores are placed on
the same scale. However, if the purpose is to estimate the proportion of students in a school at
each of the PISA levels from their OSSLT/TPCL scores, the equipercentile method would be
appropriate.
The equivalent cut scores for the OSSLT/TPCL on the PISA scale produced by the FIP
linking method were lower than those obtained by the equipercentile method. The different cut
scores across the sample and for each of the genders provide evidence that the constructs
measured by the two tests are not identical. Literacy is complex and requires a set of skills,
knowledge of language conventions and language processes, in addition to reading.
The linking procedures applied in this study cannot be considered truly equating for a
number of reasons. One property of equating is group invariance, which states that the equating
relationship remains the same for all groups of examinees used to conduct equating (Kolen &
Brennan, 2004). In this study, group invariance was not met because the cut scores for males and
females were different, largely because the formula used to transform the theta values to PISA
scale scores was not the same for both genders. Also, although the constructs measured by the
two tests are very similar, the test specifications indicate that the tests measured some different
aspects of literacy. Furthermore, the calibration models for the two assessments are not identical.
For example, a strict Rasch model is used in PISA whereas a modified one-parameter model is
used for the OSSLT/TPCL. Therefore, a perfect equating was not expected.
23
The discriminant function analysis showed that successful and unsuccessful students on
the OSSLT/TPCL can be classified 80% correctly using the PISA reading scores. Only 45% of
students were correctly classified into their observed PISA level based on their OSSLT/PISA
score, but more than three-quarters of students were classified correctly as being at or above
PISA Level 2. This level of accuracy is too low to enable a reliable prediction of PISA scores
from OSSLT/TPCL scores.
In conclusion, this study provides evidence that the provincial standard on the
OSSLT/TPCL is comparable to the PISA standard for minimum competency in literacy.
However, linking scores of the two assessments was somewhat challenging, partly because the
tests measure some different aspects of literacy and because of the different methods used to
arrive at scale scores. The two methods in this study did not produce identical results, which
highlights the challenges in linking provincial and international assessments. Since the linking
procedures used in this study do not represent true equating, care must be taken when
interpreting the results.
This linking study provides useful information that adds another dimension to the
interpretation of the results of both assessments, making both sets of results more meaningful.
24
References
American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education, (1999). Standards for Educational and
Psychological Testing. Washington, DC: American Educational Research Association.
Cartwright, F. (2003). Equipercentile methods of linking inter-regional and regional
assessments. Unpublished master’s thesis. University of Alberta, Alberta, Canada.
Cope, R. T., & Kolen, M. J. (1990). A study of methods for estimating distributions of test scores.
(American College Testing Research Report 90-5.) Iowa City, IA: American College
Testing.
Education Quality and Accountability Office. (2007). Framework, Ontario Secondary Literacy
Test. Toronto, ON: Author.
Hanson, B. A. (1990). An investigation of methods for improving estimation of test score
distributions. (American College Testing Research Report 90-4). Iowa City, IA:
American College Testing.
Kolen, M. J., & Brennan, R.L. (2004). Test equating, scaling, and linking: Methods and
practices (2nd ed.). New York, NY: Springer-Verlag.
Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education,
6(1):83-102.
Linn, R. L., & Kiplinger, V. L. (1994). Linking statewide tests to the National Assessment of
Educational Progress: Stability of results. Applied Measurement in Education, 8, 135-
156.
Linn, R. L., McLaughlin, D., & Thissen, D. (2009). Utility and validity of NAEP linking efforts.
Washington, DC: American Institutes for Research, NAEP Validity Studies Panel.
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex
samples. Psychometrika, 56, 177–196.
Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of
Educational Statistics, 17, 131–154.
Mislevy, R. J. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and
Prospects. Princeton, NJ: Educational Testing Service.
OECD (2005). PISA 2003 technical report. Paris: Author.
25
OECD (2009). PISA 2009 Assessment framework: Key competencies in reading, mathematics and
science. Paris: Author.
OECD (2010). PISA 2009 Results: What Students Know and Can Do: Student Performance in
Reading, Mathematics and Science (Volume I).
doi: 10.1787/9789264091450-en
Pashley, P. J., & Phillips, G. W. (1993). Toward World-Class Standards: A Research Study
Linking International and National Assessments. Center for Educational Progress.
Princeton, NJ: Educational Testing Service.
Waltman, K. K. (1997).Using performance standards to link statewide achievement results to
NAEP. Journal of Educational Measurement 34(2):101-121.
Education Quality and Accountability Office, 2 Carlton Street, Suite 1200, Toronto ON M5B 2M9, 1-888-327-7377, www.eqao.com
© 2012 Queen’s Printer for Ontario Crp_ne_0912