comparison of the performance of ontario students on … · comparison of the performance of...

Comparison of the Performance of Ontario Students on the OSSLT/TPCL and the PISA 2009 Reading Assessment

Nizam Radwan, Ph.D., and Yunmei Xu, Ph.D.for the Education Quality and Accountability Office

J U LY 2 0 1 2

About the Education Quality and Accountability Office

The Education Quality and Accountability Office (EQAO) is an independent provincial agency funded by the Government of Ontario. EQAO’s mandate is to conduct province-wide tests at key points in every student’s primary, junior and secondary education and report the results to educators, parents and the public.

EQAO acts as a catalyst for increasing the success of Ontario students by measuring their achievement in reading, writing and mathematics in relation to Ontario Curriculum expectations. The resulting data provide a gauge of quality and accountability in the Ontario education system.

The objective and reliable assessment results are evidence that adds to current knowledge about student learning and serves as an important tool for improvement at all levels: for individual students, schools, boards and the province.

About EQAO Research

EQAO undertakes research for two main purposes:

• to maintain best-of-class practices and to ensure that the agency remains at the forefront of large-scale assessment and

• to promote the use of EQAO data for improved student achievement through the investigation of means to inform policy directions and decisions made by educators, parents and the government.

EQAO research projects delve into the factors that influence student achievement and education quality, and examine the statistical and psychometric processes that result in high-quality assessment data.

Education Quality and Accountability Office, 2 Carlton Street, Suite 1200, Toronto ON M5B 2M9, 1-888-327-7377, www.eqao.com

© 2012 Queen’s Printer for Ontario

1

Introduction

With Ontario students’ continued participation in various provincial, national and

international assessments, there is an interest in comparing the performance of students on the

EQAO assessments with their performance on national and international assessments. Parents,

educators and policy makers want to know how well students who were successful on provincial

tests performed on national or international assessments. However, even though these

assessments measure the same content area (e.g., reading), comparisons are not straightforward

because results of different assessments are not reported on a common scale.

The purpose of this study is to explore the possibility of linking the Ontario Secondary

School Literacy Test/Test provincial de compétences linguistiques (OSSLT/TPCL) and the

Programme for International Student Assessment (PISA) 2009 reading assessment. In 2009,

PISA and the OSSLT/TPCL were administered between April and May. In the case of the

OSSLT/TPCL, all eligible students in the province were assessed, and among these students, a

probability sample of 15-year-olds took the PISA reading assessment. The fact that a large

number of students took both tests in such short time frame provided a strong basis for linking

the scores on the two tests. This linking could provide valuable information for education policy

makers and stakeholders about the Ontario provincial standard relative to international

benchmarks. This study used two different methods to link the two tests’ score distributions: the

Fixed Item Parameter (FIP) linking procedure implemented through an Item Response Theory

(IRT) model and the equipercentile with pre-smoothing procedure (Kolen & Brennan, 2004).

Successful linking will not only give useful information about educational policy, but will also

add to the psychometric literature about scaling two tests that are constructed from two different

tables of specifications but within the same substantive area.

Background

OSSLT/TPCL

The OSSLT/TPCL is a provincial test of minimum-competency literacy skills, where

literacy is defined in terms of reading and writing expectations in The Ontario Curriculum up to

2

the end of Grade 9. There are two language forms, one for English-language students and one for

French-language students. Both forms are administered annually to Grade 10 students attending

public and private schools in Ontario. A successful outcome on the OSSLT/TPCL is one of the

provincial high school graduation requirements. Students who write the OSSLT/TPCL once and

are unsuccessful may retake the test in a subsequent year or enroll in and successfully complete

the Ontario Secondary School Literacy Course (OSSLC).

Student responses to the OSSLT/TPCL are analyzed using a modified one-parameter IRT

model with a fixed a- and c-parameter for multiple-choice items and the generalized partial

credit model for open-response items. The results from one year to the next are equated using the

Fixed Common Item Parameter approach.

PISA

The Organization for Economic Co-operation and Development (OECD) initiated the

PISA assessments in response to a strong international interest in having a comparative

international assessment to measure how well students at age 15 are prepared academically to

meet the challenges of the future. This age group was selected because students are approaching

the end of compulsory education. The assessment is administered every three years and assesses

three subjects: reading, mathematics and science. In every administration, one subject area is

emphasized: two-thirds of the testing time is devoted to that subject and the remaining third is

devoted to the other two subjects. The 2000 administration emphasized reading literacy, 2003

emphasized mathematics, and 2006 emphasized science. In 2009, PISA once again emphasized

reading literacy, but the assessment was expanded to include reading and understanding

electronic texts in order to reflect the importance of computer technology in education. Ontario

did not administer this component of the assessment. Between 4 500 and 10 000 students in each

of more than 60 OECD member and partner countries participated.

PISA is constructed using a common, internationally-agreed-upon framework (OECD,

2009). PISA also employs a matrix sampling approach to achieve larger content coverage

without requiring students to write very long tests. All test items are divided into a set of item

blocks. These blocks are carefully arranged into booklets, each having an equal number of items

(and requiring equal testing time), with balanced content coverage and item format. Each student

is then randomly assigned one of the booklets. As a result, individual students answer a

3

manageable number of items and the curriculum is broadly covered at the aggregate level (Childs

& Jaciw, 2003).

Like the OSSLT/TPCL, PISA uses a one-parameter IRT model, which produces

comparable results for all participating countries. However, PISA uses the plausible-value

method (Mislevy, 1991; Mislevy, Johnson & Muraki, 1992) to calculate student proficiency

estimates. Plausible values are random draws from the posterior distribution of the latent

proficiency variable and are estimated using item and background information. This approach

allows improved estimates at the group level (Mislevy, 1991; Mislevy, Johnson & Muraki,

1992). Group-level results that are reported include the mean, standard deviation, and various

percentiles at the aggregated level. PISA does not report individual students’ results.

Both the OSSLT/TPCL and PISA meet high standards with respect to test design, test

content and psychometric quality. They also test a similar grade/age of students and content area.

However, the two tests differ in three ways. First, the two assessments were constructed using

different test frameworks. Second, the intended uses of the two assessments are different. PISA

is a low-stakes test for students: it has no direct consequences for them because results are not

reported to students. The OSSLT/TPCL is a high-stakes test, because results are reported to

students and the results affect their graduation status. Third, the two assessments differ in their

design and the reporting of results. PISA uses a matrix sampling design while OSSLT/TPCL has

a single test form, and the results for the two tests are reported on different scales. These

differences between PISA and the OSSLT/TPCL may pose significant challenges to linking

them.

Linking approaches

Test linking is the process by which the results from one test are used to predict the

results of another test (Linn, McLaughlin & Thissen, 2009). Linn (1993) and Mislevy (1992)

identified four types of linking; these types (ordered from the strongest to the weakest) are:

equating, calibration, projection and moderation. Equating is the strongest form of linking.

Equating is possible if the tests to be linked are equivalent in content, format, purpose,

administration, item difficulty and populations (Linn, 1993 and Mislevy, 1992). In calibration,

tests that are constructed for different purposes and from different content specifications, but

cover the same substantive area (e.g., reading), are linked by placing the scores from the two

4

tests on the same scale. In projection, scores from one test are used to predict scores on another

test that may cover different content. Moderation is the weakest form of linking. It is used on

tests developed from different blueprints and administered to nonequivalent populations. Clearly,

the type of linking used depends upon the degree of comparability of the tests’ content and

psychometric properties.

Earlier research has empirically demonstrated procedures for linking assessments. In the

United States, many studies have linked scores from state-level assessments to those from

national and international assessments. Linear regression analyses or equipercentile procedures

were commonly used in these studies. Pashley and Phillips (1993) investigated linking between

the International Assessment of Educational Progress (IAEP) and the National Assessment of

Educational Progress (NAEP), which were built to different content specifications. They worked

with sample data from 1609 students who took both assessments. They determined a linear

relationship between IAEP and NAEP proficiency estimates and used it to estimate the

percentages of students from the IAEP who could perform at or above the three performance

levels established for NAEP. Pashley and Phillips concluded that it is possible to establish an

accurate statistical link between the IAEP and NAEP, but they warned that the linking results

should be interpreted with caution because it is difficult to measure the effects of unexplored

sources of non-statistical error on score results, such as different motivation levels of students,

which could affect test scores. Linn and Kiplinger (1994) used equipercentile procedures to link

data from state tests and NAEP. They found that the linking function could estimate average

state performance on NAEP, but was not accurate for scores at the top or bottom of the scale.

Also, the linking function was different for male and female subgroups, which indicated that the

linking function is not invariant across different subgroups.

Waltman (1997) used equipercentile procedures to link the Iowa Tests of Basic Skills

(ITBS) and NAEP. The two assessments were also linked using a social moderation approach in

which the achievement level descriptions used for NAEP were used by judges to set performance

standards (basic, proficient and advanced) on the ITBS. The results of the study showed that for

students who took both assessments, the corresponding achievement regions on the NAEP and

ITBS scales produced low to moderate percents of agreement in student classification.

Agreement was particularly low for students at the advanced level; two-thirds or more were

classified differently.

5

In Canada, Cartwright (2003) investigated four different linking procedures

(equipercentile with the 4-parameter beta distribution, Gaussian kernel smoothing, and finite and

variable Gaussian mixture models) to link the reading component of the Foundation Skills

Assessment (FSA) administered to Grade 10 students in British Columbia with the PISA reading

literacy assessment, both administered in 20001. Cartwright examined the bias and variability of

the linked scores and the accuracy of the estimated standard errors of linking. He concluded that

the equipercentile method was the most appropriate method for linking the two tests. The results

of this study indicated that it is feasible to provide valid linking results between a regional test

(FSA) and an international test (PISA)2.

Purpose

The purpose of this study was to investigate whether linking the OSSLT/TPCL with the 2009

PISA reading assessment is feasible. Two linking procedures were explored: the Fixed Item

Parameter (FIP) procedure, which is similar to the equating procedure currently used by EQAO

to equate its operational assessments from year to year, and the equipercentile procedure. The

study addressed the following five research questions:

1. How do successful/unsuccessful students on the OSSLT/TPCL tend to be classified on PISA?

2. What is the strength of the relationship between student performance on PISA and on the

entire OSSLT/TPCL test, the OSSLT/TPCL reading sub-test and the OSSLT/TPCL writing

sub-test?

3. How comparable are the results obtained using the equipercentile procedure and the results

obtained using the FIP procedure?

4. How accurate is the linkage between the two tests? Is the linking function similar for both

male and female students?

5. How well does the performance of students on the OSSLT/TPCL predict their performance on

PISA?

1 FSA currently is an annual provincial assessment program at Grades 4 and 7, which did include assessments at Grade 10 at the time of the study. It measures students’ academic skills in reading, writing and numeracy.

2 A second more recent study (2008) was completed in British Columbia in which student performance on the 2008 Grade 4 FSA reading test was linked to student performance on the 2006 Progress in International Literacy Study (PIRLS) test, but there is very little information available on this study. Linking procedures similar to those used by Cartwright (2003) were employed and the linking was considered to be successful.

6

Method

Data

A common student identification number across the two assessments was used to create a

combined data set for this study, which contained students’ responses to the 2009 PISA reading

items and the 2009 OSSLT/TPCL items. The combined data set has 3726 Ontario students, with

2450 students from English-language schools and 1276 students from French-language schools.

The data set also contained outcome variables; various background variables such as gender,

language, English language learner, or special education recipient; and a variable indicating

which PISA booklet the student used.

Test items

For PISA, each student responded to about 28 items, which, due to PISA’s use of matrix

sampling, is a portion of the 101 test items—47 multiple-choice and 54 open-response items. For

the OSSLT/TPCL, students responded to all of the 47 test items—39 multiple-choice and 8

open-response items (Table 1).

Table 1 Number of Test Items in PISA and the OSSLT/TPCL

Assessment Multiple-choice items Open-response items

PISA reading 2009

(across all booklets) 47 54

OSSLT/TPCL (reading and

writing)

31 (reading)

8 (writing)

4 (reading)

2 (short writing)*

2 (long writing)*

*Each writing prompt was scored for topic development and conventions, which resulted in 8 scores.

Outcome variables

For the OSSLT/TPCL, the outcome variable for each student is a score reported on a

scale from 200 to 400. A score of 300 is the minimum required for a successful result. For PISA,

the outcome variable for each student is indicated by five plausible values reported on a scale

7

from 200 to 800, with a mean of 500 and a standard deviation of 100. PISA calculates five

plausible values for each student because, as mentioned previously, PISA uses a matrix sampling

design. Since proficiency is measured with a subset of the total item pool, each individual

proficiency estimate has a substantial amount of measurement error. Using multiple values

accounts for the uncertainty associated with a student’s estimated score because multiple values

represent the likely distribution of a student’s proficiency (OECD, 2005).

Analysis

Content alignment

To strengthen the interpretability of the linking study results, content experts at EQAO

reviewed the OSSLT/TPCL 2009 and the PISA 2009 reading assessment to see how similar the

content of the two assessments were. If they were highly similar, then the scores from the linking

process could be treated as equated scale scores. If the two tests were only moderately similar,

then the scores from the linking process could be treated as comparable scale scores (American

Education Research Association, American Psychological Association, & National Council on

Education, 1999).

Descriptive statistics

The following descriptive statistics were generated:

1. the mean and standard deviation of the PISA reading and the OSSLT/TPCL scale scores;

2. the Pearson product moment correlation coefficients between the OSSLT/TPCL total test,

OSSLT/TPCL reading and OSSLT/TPCL writing and PISA reading; and

3. the distribution of successful and unsuccessful students on the OSSLT/TPCL across

PISA proficiency levels. For the OSSLT/TPCL, students with scale scores of 300 and

above were classified as successful and those below 300 were classified as unsuccessful.

For PISA, students were classified into five proficiency levels based on the cut scores

published for the 2009 PISA reading assessment (OECD, 2010): Below Level 2, Level 2,

Level 3, Level 4 and Level 5 and Above.

8

Discriminant function analysis

Discriminant function analysis was conducted on each of the PISA five plausible values

to examine how well successful and unsuccessful students on the OSSLT/TPCL tend to be

classified by PISA scores. A cross tabulation of the original OSSLT/TPCL outcome and the

outcome based on the discriminant analysis was determined. The percentage of misclassified

students and the total percentage of error were calculated.

Linking procedures

The equipercentile procedure and the fixed item parameter (FIP) procedure were used to

link the OSSLT/TPCL and PISA assessments. OECD (2005) suggests conducting an analysis

using all five plausible values to increase the accuracy of results, therefore, the following linking

analyses used each of the five plausible values for both English- and French-Language groups.

Equipercentile procedure

When the equipercentile procedure is used, the distributions to be linked are often

irregular, with sharp “mountains” and “valleys.” The reason for the irregular shapes is that while

the underlying construct being measured is continuous, the scores reflecting possession of the

construct are discrete. Also, not all score points exist, leading to gaps in the distributions.

Therefore, to improve the accuracy of equipercentile linking and to reduce the linking error, a

pre-smoothing method is often used. Among the several pre-smoothing methods, Cope and

Kolen (1990) and Hanson (1990) suggested that the 4-parameter beta binomial model and the

log-linear model could be used to pre-smooth the score distributions to be linked. As mentioned

previously, Cartwright (2003) found that the 4-parameter beta binomial model provided the best

fit among the four different pre-smoothing methods he investigated. The following steps were

followed to link the OSSLT/TPCL and PISA for each of the five plausible values and for each of

the two language groups:

1. The 4-parameter beta model was used to pre-smooth both the PISA and OSSLT/TPCL

distributions.

2. Equipercentile linking was conducted to link the OSSLT/TPCL and PISA.

3. The OSSLT/TPCL cut scores were converted to PISA scale scores and the performance

standards for PISA and the OSSLT/TPCL were compared by identifying the scale score

9

on the PISA scale that corresponded to the cut point for a successful outcome on the

OSSLT/TPCL.

FIP Procedure

The FIP linking procedure involves fixing the parameter estimates for the items on one

test and then calibrating the second test with the first test. This will place the second test onto the

scale of the first test. The FIP linking procedure was carried out to place the OSSLT/TPCL and

PISA on the same scale. The following steps were conducted:

1. An IRT calibration was conducted on the combined data set with PISA item parameters

fixed3. This step produced a set of OSSLT/TPCL item parameter estimates (including

both reading and writing) on the PISA scale. The IRT model used in this step was the

modified one-parameter model (the a-parameter was fixed to 0.588 and the c-parameter

was fixed to 0.20) used by EQAO to calibrate the OSSLT/TPCL. The generalized partial

credit model was used for the polytomous items.

2. The rescaled OSSLT/TPCL item parameter estimates from Step 1 were used to score the

OSSLT/TPCL data. This step produced a rescaled OSSLT/TPCL θ-value on the PISA

scale for each student.

3. The θ-distributions and the IRT test information functions (TIFs) of the rescaled

OSSLT/TPCL and the PISA tests were compared. Similarity/dissimilarity of the θ-

distributions and TIFs provided information on the accuracy of the linking function.

4. The PISA scale scores equivalent to the OSSLT/TPCL cut scores were identified by

identifying scores on the rescaled OSSLT/TPCL θ-values that had the same percentile

ranks as the cut score on the original OSSLT/TPCL scale. Then, each student was

classified into OSSLT/TPCL categories by applying the new cut score to the PISA scale

score. The agreements between the percentages of students assigned to proficiency

categories were determined.

3 PISA publishes item parameters in technical reports. Since the PISA 2009 technical report has not been released, OECD shared the PISA 2009 item parameters with EQAO through email before they became publically available.

10

Regression

For each of the plausible values, a simple regression was conducted to predict the PISA

scale scores from the OSSLT/TPCL scale scores. The variance in the dependent variable

explained by the independent variable was computed. The predicted PISA scale scores were

transformed to PISA levels using the original PISA cut scores. The predicted PISA levels were

then cross-tabulated with the original PISA scores and the percentage of agreement was

calculated.

Results

Content Alignment

Content experts at EQAO examined the content frameworks and reading items for PISA

and the OSSLT/TPCL. Since not all of the PISA items were published, the alignment study was

based on the sample of PISA items available to the public. The results of the content alignment

review are summarized in Table 2. While both the PISA reading assessment and OSSLT/TPCL

are literacy tests, they are not identical. However, the two assessments are strongly similar in

purpose, types of items, text forms used, types of tasks and cognitive skills assessed.

11

Table 2 Content Comparison between PISA Reading Assessment and the OSSLT/TPCL Characteristic PISA reading assessment OSSLT/TPCL

Scope Reading Reading and writing

Method 120 min paper- &-pencil tests [Canada chose not to take the additional 40-min assessment of understanding electronic texts]

47 MC items (46.5%) 54 OR items (53.5%)

150 min paper-&-pencil tests [approximately 75 min devoted to reading components; 75 min to writing components]

31 MC reading items (71.6%); 4 OR reading items (28.3%)

Approximate tasks per text breakdown

70% devoted to continuous texts 30% devoted to non-continuous texts

92% devoted to continuous texts 8% devoted to graphic text

Reading processes: (aspect/skills) assessed & approximate distribution of tasks by aspect/skill

task distribution per “aspect” assessed: 25% on explicitly stated information

& ideas (access and retrieve) 50% on implicitly stated information

and ideas (integrate & interpret) 25% on making connections between

information and ideas in reading selections and personal knowledge and experience (reflect & evaluate)

task distribution per “skills” assessed: 20% on understanding explicitly stated

information & ideas (access and retrieve)

60% on understanding implicitly stated information and ideas (interpret)

20% on making connections between information and ideas in reading selections and personal knowledge and experience (interpret & integrate)

Descriptive Statistics

The mean and standard deviation of the PISA reading and the OSSLT/TPCL scale scores

for the Ontario students who participated in both tests are summarized in Table 3 by language.

As noted previously, PISA and the OSSLT/TPCL were reported on different scales: PISA was

reported on a 200 to 800 scale with a mean of 500 and a standard deviation of 100 whereas

OSSLT/TPCL was reported on a 200 to 400 scale. The means of the OSSLT/TPCL scores are

similar for the two language groups, but the mean of the PISA scores for the English-language

students is 58 points higher than that for the French-language students.

12

Table 3 Mean and Standard Deviation of the Five PISA Plausible Values and OSSLT/TPCL Scores for Students who Completed both Tests

Scores English French

N Mean S.D. N Mean S.D.

OSSLT/TPCL

2450

330 25

1276

327 27

PV1* 535 85 477 86

PV2 536 85 476 86

PV3 535 84 476 87

PV4 535 85 477 86

PV5 536 86 477 85

*PV=plausible value

The Pearson product-moment correlation coefficients between the PISA reading scores

and the OSSLT/TPCL scores ranged from 0.68 – 0.70 for the total test. The coefficients for the

OSSLT/TPCL reading scores (0.63 – 0.66) were slightly higher than those for the OSSLT/TPCL

writing scores (0.60 – 0.63) (Table 4).

Table 4 Correlation Coefficients between the OSSLT/TPCL and the PISA Reading

Language

PISA plausible

value

OSSLT/TPCL

Total test Reading Writing

English

1 0.70 0.66 0.61

2 0.69 0.65 0.60

3 0.69 0.64 0.61

4 0.70 0.65 0.61

5 0.70 0.66 0.60

French

1 0.69 0.65 0.61

2 0.69 0.64 0.63

3 0.70 0.65 0.63

4 0.70 0.65 0.63

5 0.68 0.63 0.61

13

Figure 1 shows the percentage of Ontario students in each of the PISA performance

categories for those students who wrote both tests. A higher percentage of English-language

students than French-language students achieved Level 2 and above.

Figure 1. Percentage of Ontario French- and English-language students in the PISA 2009

proficiency levels

Most of the successful students on the OSSLT/TPCL in both languages were classified at

Level 2 and above in the PISA proficiency levels (Figure 2).

21%

6%

30%

18%

31%

31%

15%

30%

3%

14%

0% 20% 40% 60% 80% 100%

French (n=1,276)

English (n=2,450)

Language Below Level 2

Level 2

Level 3

Level 4

Level 5 and above

% of students at each level (PISA 2009)Below Level 2 to Level 6

14

Figure 2. Distribution of students across the PISA proficiency levels for students who were

successful on the OSSLT/TPCL

Most of the students who were successful on the OSSLT/TPCL were classified at Level 2 or

above on PISA (97% for English-language students and 87% for French-language students)

(Table 5). Of the unsuccessful students, 65% of the English-language students and 30% of the

French-language students were classified at Level 2 or above on PISA.

Table 5 Distribution of Students across PISA Levels within OSSLT/TPCL Outcome Categories

PISA levels

Percentage of students within OSSLT/TPCL outcome

English language French language

Successful Unsuccessful Successful Unsuccessful

Below level 2 3 35 13 70

Level 2 15 48 31 26

Level 3 and above 82 17 56 4

Total 100 100 100 100

3

15

33 34

1513

31

36

18

3

0

5

10

15

20

25

30

35

40

Below Level 2 Level 2 Level 3 Level 4 Level 5 and above

Percentage

PISA Levels

English (Successful)n=2,214

French (Successful)n=1,099

15

Discriminant Function Analysis

A discriminant function analysis was conducted to find out how well PISA scores

classify successful and unsuccessful students on the OSSLT. Table 6 shows the percentage of

students who were misclassified. The average error percentage is calculated as the sum of the

weighted misclassifications times the prior probability. In this case, since there are only two

outcomes — successful and unsuccessful — the prior probability was set at 50%. Since the

average error percentage is about 20%, this means that about 80% of the OSSLT/TPCL students

were classified correctly by PISA scores.

Table 6 Percentage of Students Misclassified on the OSSLT/TPCL and Average Error of

Misclassification

Language

Percentage of misclassified students

Plausible value

Misclassified as successful

Misclassified as unsuccessful

Average error

English

1 17.4 21.6 19.5

2 16.5 21.8 19.2

3 15.7 21.9 18.8

4 14.4 21.5 18.0

5 17.0 22.2 19.6

French

1 22.0 22.7 22.4

2 19.2 22.6 20.9

3 17.5 22.4 20.0

4 18.1 21.7 19.9

5 19.2 23.0 21.1

A discriminate function analysis was also conducted to classify students into the PISA

proficiency levels based on their OSSLT/TPCL scores. Approximately 45% of students were

correctly classified into their observed PISA level based on their OSSLT/TPCL score, with the

percentages being considerably higher for Levels 1 and 6. More than three-quarters of students

were correctly classified as being at or above Level 2.

16

Linking Results

Equipercentile procedure

After linking the OSSLT/TPCL to the PISA using the equipercentile procedure, the

OSSLT/TPCL cut score was converted to a PISA scale score. The cut-score equivalents are

reported in Table 7 for each of the five plausible values for both languages and across gender.

Within each language group and gender, consistent cut scores were identified over the five

plausible values. For the English-language test, the OSSLT cut score of 300 is equivalent to 433

on PISA; whereas, for the French-language test, the TPCL cut score of 300 is equivalent to 388

on PISA, which is 45 points below that for the English-language test.

For the English-language test, the OSSLT cut score of 300 is equivalent to 422 and 446

on PISA scale for males and females, respectively. For the French-language test, the TPCL cut

score of 300 is equivalent to 387 and 389 on PISA scale for males and females, respectively. The

average cut score for French-language males is 35 points below that for English-language males

and the average cut score for the French-language females is 57 points below that for the

English-language females.

Table 7 PISA Reading Scale Score Equivalent to OSSLT/TPCL Cut Score

Language Students Plausible value 1

Plausible value 2

Plausible value 3

Plausible value 4

Plausible value 5

English

All 433 433 434 433 433

Male 421 423 423 422 422

Female 446 446 447 446 445

French

All 388 388 388 388 390

Male 388 388 386 386 390

Female 387 388 391 390 389

Comparing the OSSLT/TPCL and PISA standards using the PISA reading scale, the

OSSLT cut score is slightly above the PISA cut score for Level 2, and the TPCL cut point is

slightly below the PISA cut point for Level 2 (Figure 3). The results for English- and French-

language students suggest that students classified as meeting the provincial standard on the

17

OSSLT/TPCL are somewhat equivalent to PISA Level 2 and above. This means that students

who met the Ontario provincial literacy standards would likely have demonstrated, according to

OECD (2010) “the reading literacy competencies that will enable them to participate effectively

and productively in life.”

Figure 3. Comparison of OSSLT/TPCL and PISA standards for both languages: Equipercentile

equating

Descriptions of the OSSLT/TPCL successful standard and the PISA standard for

Level 2 are quite similar. On the OSSLT/TPCL, the standard for success is defined as

students having the minimum “reading and writing skills required to understand reading

selections and communicate through a variety of written forms as expected in The

Ontario Curriculum across all subjects up to the end of Grade 9” (Education Quality and

PISA Reading

Scale

PISA Reading Proficiency Levels

200

Level 4 (553 - 625)

Level 3 (480 - 552)

Level 2 (407 - 479)

400

Below Level 2 (Below 407)

700

600

500

300

Level 5 and Above (Above 625)

TTPCL (388)

OSSLT (433)

18

Accountability Office, 2007 (December), p. 10). The PISA level 2 standard states “Level

2 is considered a baseline level of proficiency, at which students begin to demonstrate the

reading literacy competencies that will enable them to participate effectively and

productively in life” (OECD, 2010). This similarity is consistent with the results shown

in Figure 3.

FIP procedure

The FIP procedure was conducted to link PISA reading and the OSSLT/TPCL using the

modified one-parameter IRT model for multiple-choice items and the generalized partial credit

model for the open response items. The results show that for English-language students, the

OSSLT cut score is equivalent to 427 on the PISA scale, which is above the lower boundary of

PISA reading proficiency Level 2. For French-language students, the TPCL cut score is

equivalent to 396 on PISA scale, which is slightly below the lower boundary of PISA reading

proficiency Level 2. The cut score for the French-language TPCL is 31 points below the English-

language test cut score.

The cut score was also determined for males and females. For the English-language test,

the OSSLT cut score of 300 is equivalent to PISA scores 423 and 431 for males and females,

respectively. For the French-language test, the TPCL cut score of 300 is equivalent to PISA

scores 394 and 403 for males and females, respectively. The cut score for the French-language

TPCL for males is 29 points below the English-language test cut score. The cut score for the

French-language TPCL for females is 28 points below that for the English-language test. These

differences between male and female cut scores are due to the different formulas used for each

gender to transform the θ-values to PISA scale score.

A comparison of the equipercentile and FIP procedures reveals differences in their results

(Table 8). The PISA scale score equivalent to the OSSLT/TPCL cut score for all French-

language students is 45 points below that for all English-language students for the equipercentile

procedure. But this difference is only 31 points for the FIP procedure. The PISA scale score

equivalent to the OSSLT/TPCL cut score for French-language male and female students is 35

and 57 points below that of the English-language students, respectively in the equipercentile

procedure, but these differences are only 29 and 28 points, respectively in the FIP procedure.

Given the smaller differences between the cut scores for males and females in the FIP procedure,

19

the FIP procedure seems more effective than the equipercentile procedure for these two

assessments.

Table 8 PISA Reading Scale Score Equivalent to OSSLT/TPCL Cut Score across Language and Gender for the Equipercentile and FIP Linking

Students

English language French language

Equipercentile FIP Equipercentile FIP

All 433 427 388 396

Males 422 423 387 394

Females 446 431 389 403

Regression

A simple regression was conducted to predict the PISA scale scores from the

OSSLT/TPCL scale score for each of the five plausible values in the sample. Values deemed to

be outliers were removed from the regression analysis. The agreement between the original

classification and the predicted classification in the sample was tabulated for each of the

plausible values. Tables 9 and 10 show the cross-tabulation of the observed and predicted PISA

proficiency levels for plausible-value one for the English- and French-language samples,

respectively. The agreement between the observed and predicted levels was not high; the

observed and predicted PISA levels were the same for only about half of the students. Also, a

higher degree of agreement tended to be obtained for the middle levels, highlighting the

tendency for regression to underestimate the number of students at the extremes of the scale. The

correlation coefficients between the observed and the predicted PISA plausible values for the

English- and French-language samples ranged from 0.71 to 0.72 and 0.69 to 0.72, respectively.

20

Table 9 Agreement between the Observed and Predicted PISA Levels for the English-Language sample using PISA Plausible Value 1

PISA plausible value 1

Predicted levels

Below level 2

Level 2 Level 3 Level 4 Level 5

and above

Total

Observed levels

Below level 2 N 28 91 50 2 0 171

% 16.4 53.2 29.2 1.2 0.0 100.0

Level 2 N 8 188 209 31 0 436

% 1.8 43.1 47.9 7.1 0.0 100.0

Level 3 N 3 115 465 171 8 762

% 0.4 15.1 61.0 22.4 1.0 100.0

Level 4 N 0 18 289 346 47 700

% 0.0 2.6 41.3 49.4 6.7 100.0

Level 5 and above

N 0 0 55 211 94 360

% 0.0 0.0 15.3 58.6 26.1 100.0

Total N 39 412 1068 761 149 2429

% 1.6 17.0 44.0 31.3 6.1 100.0

21

Table 10 Agreement between the Observed and Predicted PISA Levels for the French-Language sample using PISA Plausible Value 1

PISA plausible value 1

Predicted levels

Below level 2

Level 2 Level 3 Level 4 Level 5

and above

Total

Observed levels

Below level 2

N 119 121 20 1 0 261

% 45.6 46.4 7.7 0.4 0.0 100.0

Level 2 N 49 213 109 10 0 381

% 12.9 55.9 28.6 2.6 0.0 100.0

Level 3 N 9 117 221 32 2 381

% 2.4 30.7 58.0 8.4 0.5 100.0

Level 4 N 0 27 125 48 6 206

% 0.0 13.1 60.7 23.3 2.9 100.0

Level 5 and above

N 0 0 17 20 4 41

% 0.0 0.0 41.5 48.8 9.8 100.0

Total N 177 478 492 111 12 1270

% 13.9 37.6 38.7 8.7 0.9 100.0

Discussion

The results showed that most of the Ontario students who met the provincial standard on

the OSSLT/TPCL (successful students) were classified at PISA Level 2 or above when the two

tests were linked using the equipercentile and FIP procedures. This finding is consistent with the

similarity in definitions of literacy for the two assessments at these cut points. As indicated

earlier, PISA Level 2 is considered a baseline level of proficiency that is similar to the literacy

standard expected for a successful outcome on the OSSLT/TPCL. The correlation coefficients

between PISA scores and the total-test scores for OSSLT/TPCL were slightly higher than those

between PISA and the reading component of the OSSLT/TPCL, and the correlation coefficients

for the reading component were slightly higher than those for the writing component. This

22

finding supports the decision to conduct linking analyses between PISA and the total

OSSLT/TPCL.

Although the equipercentile and the FIP procedures have the same purpose (to determine

the relationship between the scores on two tests that measure the same construct and to identify

the scores on one test that are equivalent to the scores on the other test), each method achieves

this goal in a different way. The equipercentile procedure identifies scores on one test that have

the same percentile ranks as the scores on another test. The equated distribution of scores is

based on equal percentile ranks from the two tests. The FIP procedure places the parameter

estimates for items on the two tests and on the same scale. The results for these two procedures

were similar but not identical. If the purpose is to predict students’ scores on PISA from their

OSSLT/TPCL scores, then the FIP procedure provides more accurate interpretations than the

equipercentile procedure because in the FIP, the OSSLT/TPCL and PISA scores are placed on

the same scale. However, if the purpose is to estimate the proportion of students in a school at

each of the PISA levels from their OSSLT/TPCL scores, the equipercentile method would be

appropriate.

The equivalent cut scores for the OSSLT/TPCL on the PISA scale produced by the FIP

linking method were lower than those obtained by the equipercentile method. The different cut

scores across the sample and for each of the genders provide evidence that the constructs

measured by the two tests are not identical. Literacy is complex and requires a set of skills,

knowledge of language conventions and language processes, in addition to reading.

The linking procedures applied in this study cannot be considered truly equating for a

number of reasons. One property of equating is group invariance, which states that the equating

relationship remains the same for all groups of examinees used to conduct equating (Kolen &

Brennan, 2004). In this study, group invariance was not met because the cut scores for males and

females were different, largely because the formula used to transform the theta values to PISA

scale scores was not the same for both genders. Also, although the constructs measured by the

two tests are very similar, the test specifications indicate that the tests measured some different

aspects of literacy. Furthermore, the calibration models for the two assessments are not identical.

For example, a strict Rasch model is used in PISA whereas a modified one-parameter model is

used for the OSSLT/TPCL. Therefore, a perfect equating was not expected.

23

The discriminant function analysis showed that successful and unsuccessful students on

the OSSLT/TPCL can be classified 80% correctly using the PISA reading scores. Only 45% of

students were correctly classified into their observed PISA level based on their OSSLT/PISA

score, but more than three-quarters of students were classified correctly as being at or above

PISA Level 2. This level of accuracy is too low to enable a reliable prediction of PISA scores

from OSSLT/TPCL scores.

In conclusion, this study provides evidence that the provincial standard on the

OSSLT/TPCL is comparable to the PISA standard for minimum competency in literacy.

However, linking scores of the two assessments was somewhat challenging, partly because the

tests measure some different aspects of literacy and because of the different methods used to

arrive at scale scores. The two methods in this study did not produce identical results, which

highlights the challenges in linking provincial and international assessments. Since the linking

procedures used in this study do not represent true equating, care must be taken when

interpreting the results.

This linking study provides useful information that adds another dimension to the

interpretation of the results of both assessments, making both sets of results more meaningful.

24

References

American Educational Research Association, American Psychological Association, & National

Council on Measurement in Education, (1999). Standards for Educational and

Psychological Testing. Washington, DC: American Educational Research Association.

Cartwright, F. (2003). Equipercentile methods of linking inter-regional and regional

assessments. Unpublished master’s thesis. University of Alberta, Alberta, Canada.

Cope, R. T., & Kolen, M. J. (1990). A study of methods for estimating distributions of test scores.

(American College Testing Research Report 90-5.) Iowa City, IA: American College

Testing.

Education Quality and Accountability Office. (2007). Framework, Ontario Secondary Literacy

Test. Toronto, ON: Author.

Hanson, B. A. (1990). An investigation of methods for improving estimation of test score

distributions. (American College Testing Research Report 90-4). Iowa City, IA:

American College Testing.

Kolen, M. J., & Brennan, R.L. (2004). Test equating, scaling, and linking: Methods and

practices (2nd ed.). New York, NY: Springer-Verlag.

Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education,

6(1):83-102.

Linn, R. L., & Kiplinger, V. L. (1994). Linking statewide tests to the National Assessment of

Educational Progress: Stability of results. Applied Measurement in Education, 8, 135-

156.

Linn, R. L., McLaughlin, D., & Thissen, D. (2009). Utility and validity of NAEP linking efforts.

Washington, DC: American Institutes for Research, NAEP Validity Studies Panel.

Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex

samples. Psychometrika, 56, 177–196.

Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of

Educational Statistics, 17, 131–154.

Mislevy, R. J. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and

Prospects. Princeton, NJ: Educational Testing Service.

OECD (2005). PISA 2003 technical report. Paris: Author.

25

OECD (2009). PISA 2009 Assessment framework: Key competencies in reading, mathematics and

science. Paris: Author.

OECD (2010). PISA 2009 Results: What Students Know and Can Do: Student Performance in

Reading, Mathematics and Science (Volume I).

doi: 10.1787/9789264091450-en

Pashley, P. J., & Phillips, G. W. (1993). Toward World-Class Standards: A Research Study

Linking International and National Assessments. Center for Educational Progress.

Princeton, NJ: Educational Testing Service.

Waltman, K. K. (1997).Using performance standards to link statewide achievement results to

NAEP. Journal of Educational Measurement 34(2):101-121.

Education Quality and Accountability Office, 2 Carlton Street, Suite 1200, Toronto ON M5B 2M9, 1-888-327-7377, www.eqao.com

© 2012 Queen’s Printer for Ontario Crp_ne_0912

comparison of the performance of ontario students on … · comparison of the performance of...

Documents