large-scale diagnostic assessment: mathematics performance in two educational systems

23
This article was downloaded by: [Tufts University] On: 04 November 2014, At: 14:19 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Educational Research and Evaluation: An International Journal on Theory and Practice Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/nere20 Large-scale diagnostic assessment: Mathematics performance in two educational systems Menucha Birenbaum a , Fadia Nasser a & Curtis Tatsuoka b a Tel Aviv University , Israel b George Washington University , Washington, DC, USA Published online: 15 Feb 2007. To cite this article: Menucha Birenbaum , Fadia Nasser & Curtis Tatsuoka (2005) Large-scale diagnostic assessment: Mathematics performance in two educational systems, Educational Research and Evaluation: An International Journal on Theory and Practice, 11:5, 487-507, DOI: 10.1080/13803610500146137 To link to this article: http://dx.doi.org/10.1080/13803610500146137 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Upload: curtis

Post on 07-Mar-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Large-scale diagnostic assessment: Mathematics performance in two educational systems

This article was downloaded by: [Tufts University]On: 04 November 2014, At: 14:19Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Educational Research and Evaluation:An International Journal on Theory andPracticePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/nere20

Large-scale diagnostic assessment:Mathematics performance in twoeducational systemsMenucha Birenbaum a , Fadia Nasser a & Curtis Tatsuoka ba Tel Aviv University , Israelb George Washington University , Washington, DC, USAPublished online: 15 Feb 2007.

To cite this article: Menucha Birenbaum , Fadia Nasser & Curtis Tatsuoka (2005) Large-scalediagnostic assessment: Mathematics performance in two educational systems, EducationalResearch and Evaluation: An International Journal on Theory and Practice, 11:5, 487-507, DOI:10.1080/13803610500146137

To link to this article: http://dx.doi.org/10.1080/13803610500146137

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Page 2: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 3: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Large-Scale Diagnostic Assessment:

Mathematics performance in two

educational systems

Menucha Birenbauma, Fadia Nassera, and Curtis Tatsuokab

aTel Aviv University, Israel, and bGeorge Washington University, Washington, DC, USA

(Received 27 September 2004; accepted 7 April 2005)

A diagnostic methodology for large-scale assessment was employed to compare performance on a

national test in mathematics of representative samples of Jewish and Arab 8th graders in Israel in

order to shed light on a previously identified large achievement gap between these 2 populations.

The results revealed significant differences between the 2 groups in patterns of strengths and

weaknesses with respect to content, process, and skill/item-type attributes, indicating different paths

for remedial interventions.

Introduction

Research has pointed out a substantial discrepancy in mathematics achievement

between the Jewish and Arab populations in Israel (Aviram, Cfir, & Ben-Simon,

1999; Bashi, Kahan, & Davis, 1981; Birenbaum & Nasser, 2002; Zuzovsky, 2001),

yet the nature of this difference in terms of cognitive processes has not been

investigated thus far. In the Israeli context, the Jewish majority and the Arab minority

study under the same educational guidelines but in separate school systems with

almost no intergroup contact. The current study used a diagnostic methodology for

large-scale assessment to compare performances of representative samples of Jewish

and Arab eighth graders on a national test in mathematics. Before considering the

design of the study, a brief description of the Israeli context is provided.

The Jewish and the Arab populations in Israel represent two ethnic/cultural groups

in a conflictual relationship with little intergroup contact (Kraus, 1988). The Arab

minority constitutes approximately 20% of the population (Central Bureau of

�Corresponding author. School of Education, Tel Aviv University, Ramat Aviv 69978, Israel.

E-mail: [email protected]

Educational Research and EvaluationVol. 11, No. 5, October 2005, pp. 487 – 507

ISSN 1380-3611 (print)/ISSN 1744-4187 (online)/05/050487–21

� 2005 Taylor & Francis

DOI: 10.1080/13803610500146137

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 4: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Statistics, 1996). In contrast to Jewish Israeli society, which is by and large a typical

modern western society, Arab Israeli society is a developing society that tends to be

more traditional and conservative and maintains a clear and well-defined system of

values and customs (Batrice, 2000; Mar’i, 1978; Seginer, Karayanni, & Mar’i, 1990;

Sharabi, 1987). The Arab community in Israel is considered a non-assimilating

minority and has limited access to the opportunity structure (Al-Haj, 1995).

Consequently, the Arab minority has relatively lower standing in all aspects of

socioeconomic status (including education, occupation and income) as compared to

the Jewish majority (Al-Haj, 1995; Semyonov & Lewin-Epstein, 1994).

Although officially all government schools in Israel are open to all students, in fact

there are segregated educational systems for Arabs and Jews, both of which are run by

the State’s Ministry of Education. The languages of instruction in the Jewish and

Arab school systems are Hebrew and Arabic, respectively. Both systems share the

same official/intended curricula only in science and mathematics.

Reporting diagnostic feedback in large-scale assessments is not a common practice

but a much desired one (Atkin & Black, 1997). It can aid in interpreting test scores

and at the same time guide curricular planning and instruction so that the diagnosed

difficulties can be addressed promptly. Currently, results of national and interna-

tional tests provide interpretations to test scores (i.e., scale scores), where all test

takers who get the same scale score, or are within a prespecified range of the total

score distribution, receive the same interpretation. For instance, the diagnostic

approach in the Third International Mathematics and Science Study (TIMSS) allows

for diagnostic feedback at four benchmarks, set at the 90th, 75th, 50th, and 25th

percentiles of the international score distribution. To generate this feedback, the

performance of students whose scores were around these percentiles was examined in

terms of the educational requirements for solving anchored items that 60% of the

students in a given such group successfully answered and more than 50% of the lower

percentile group failed to answer correctly. The mastery profile for that benchmark

was specified in terms of skills judged by experts to be necessary for successfully

solving those particular items (Kelly, 2002; Mullis et al., 2001). However, there are

some inherent shortcomings to this method that preclude an accurate diagnosis on

the individual level. As noted by Kelly (2002), the benchmark descriptions must be

interpreted under the assumption that performance on the TIMSS scale is cumulative

(i.e., students reaching a particular benchmark are assumed to have acquired the

knowledge and skills described on the lower benchmark). Yet, this is not always the

case as is also implied in the other assumption, namely that performance is

continuous. Accordingly, it is recognized that students at the upper or lower ends of a

given benchmark may indeed know or understand some of the concepts that

characterize a higher benchmark, or may not know or understand some concepts that

characterize performance at a lower benchmark, respectively. A diagnostic approach

that overcomes these shortcomings is the rule space methodology (RSM) developed by

Taksuoka (1983, in press). Following is a brief account of this methodology.

RMS is used to classify examinees’ item responses according to their profile of

strengths and weaknesses on the underlying constructs measured by a test that are

488 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 5: Large-scale diagnostic assessment: Mathematics performance in two educational systems

termed attributes. An attribute is a description of a procedure, skill, or content

knowledge that a student must possess in order to successfully complete the target

task. Binary attribute patterns that express mastery and non-mastery of attributes are

termed knowledge states. Attributes and knowledge states are unobservable variables

that RSM transforms into observable attribute mastery probabilities.

RSM belongs to the branch in statistics that deals with pattern recognition and

classification problems, which has two stages: the design stage and the classification

stage. At the design stage, an object is characterized by its feature variables and

expressed by a pattern of feature variables. At the classification stage, the pattern is

classified into one of predetermined classification groups. However, attributes are

usually impossible to measure because they are latent. RSM has therefore extended

this approach to deal with latent feature variables. This is done by introducing an

item-by-attribute incidence matrix, referred to as Q matrix in RSM (Tatsuoka, 1990).

Every column in the Q matrix represents an attribute and every row an item. For

every item, 1s are assigned to attributes whose mastery is required for answering that

item correctly and 0s otherwise. These item-by-attribute involvement relationships

specify the hypothesized underlying constructs measured by the test. The only

assumption RSM uses at the design stage is that the right answer for a given item can

be obtained if and only if all attributes involved in that item are used correctly. Then

all possible combinations of attribute patterns from a given Q matrix are

mathematically generated by applying Boolean Algebra, and at the same time,

knowledge states are also expressed by their corresponding item score patterns

termed ideal item score patterns for differentiating them from students’ observable item

response patterns. Attribute patterns are not observable but corresponding ideal item

score patterns are observable, which form predetermined classification groups in

RSM. The classification space formulated by RSM thus contains a set of the possible

knowledge states generated from a given Q matrix (Tatsuoka, 1991). A student’s item

response pattern now can be classified into one of the predetermined groups by

applying Bayes’ decision rules that provides us with the student’s most plausible ideal

item score pattern with the membership probability.

To recapitulate, a unique characteristic of RSM is the correspondence between

attribute patterns and ideal item score patterns. This tie enables us making inference

regarding an examinee’s performance on latent attributes from his/her performance

on observable item responses. Hence, RSM transforms a dataset of students by item

scores into a dataset of students by attribute mastery probabilities, thus providing a

methodology for large-scale diagnostic assessment.

RSM has been shown to perform quite well in various areas such as subtraction of

fractions (Tatsuoka & Tatsuoka, 1992), signed numbers operations (Tatsuoka,

1990), algebra (Birenbaum, Kelly, & Tatsuoka, 1993), the quantitative parts of the

Scholastic Aptitude Test (SAT-M; Tatsuoka, Birenbaum, Lewis, & Sheehan, 1993),

and the Graduate Record Examination—GRE (Tatsuoka & Boodoo, 2000), as well

as in architecture (Katz, Martinez, Sheehan, & Tatsuoka, 1998), and listening

comprehension (Buck & Tatsuoka, 1998). Although the RSM has already been

successfully applied in several studies of mathematics performance, comparisons

Large-Scale Diagnostic Assessment 489

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 6: Large-scale diagnostic assessment: Mathematics performance in two educational systems

of group performances using this methodology are sparse (Tatsuoka & Boodoo,

2000).

The current study employed the RSM to diagnose students’ attribute mastery

probabilities in order to compare patterns of strengths and weaknesses in

mathematics knowledge between Jewish and Arab eighth-grade students in Israel.

Method

Participants

The research sample consisted of 2,041 eighth graders—1,406 Jewish students and

635 Arab students. This was a subsample of a representative national sample selected

for the purpose of the 1996 national assessment test in mathematics. The national

sample was stratified and included 10% of the eighth graders in that year. Participants

were all the students in the sampled classes who attended school on the day the test

was administered. Three versions of the test were randomly distributed to the entire

sample. The research sample consists of all participants who received Form A of the

test—the form that was later on disclosed to the public.

Instruments

Mathematics test. The national assessment test in mathematics (NAT-M) (Aviram

et al., 1999) is based on the formal curriculum issued by the Ministry of Education.

Senior mathematics teachers and pedagogical consultants developed the test items.

The test was translated into Arabic and reviewed by teachers and experts in

mathematics education in the Arab sector. All items were approved by the NAT

mathematics committee and were selected for inclusion in the operational version on

the basis of their psychometric properties as identified in a pilot study. Each version of

the test consisted of three parts pertaining to 12 topics. About 25% of the items

address topics studied in seventh grade. Most of the topics, which can be taught either

in eighth or in ninth grade, and topics taught at the end of the eighth grade addressed

a basic level only. The use of calculators was permitted in parts two and three of the

test.

Form A of the test included 34 items, 9 of which were in the choice response

format (multiple-choice or true-false) and the rest were in the constructed response

format. One item was a multistep investigation task. Since some of the items included

more than one section, the total number of questions in Form A of the test was 44.

Cronbach’s Alpha coefficient for the test in the entire sample was 0.91, the

reliabilities for the Jewish and Arab groups were 0.90 and 0.91, respectively.

Procedure

The NAT-M was administered in 257 classes at the end of the eighth grade. Students

were tested in class by an external examiner. Students were allowed 90 min for

490 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 7: Large-scale diagnostic assessment: Mathematics performance in two educational systems

answering the test items and the 11-item attitude questionnaire appended at the

end of the test. According to the NAT-M final report it took, on average, 59 min

for the Jewish group and 72.5 min for the Arab group to complete the NAT-M.

Only 10% of the students in each group failed to complete the test within the allotted

time. Another piece of information provided in the NAT-M final report indicates

that there was no consistent difference between the Jewish and Arab teachers’

reports regarding the coverage in class of the various topics included in the test

(Aviram et al., 1999).

Analysis

The set of attributes used in this study was adopted from Tatsuoka, Corter, and

Guerrero (2003) with minor modifications to fit the scope of the NAT-M test.

Tatsuoka and her colleagues developed these attributes for analyzing the 1999

TIMSS math items for grade eight. They classified the attributes into three categories

of content (C1 to C6), skills/item-type (S1 to S11) and processes (P1 to P10). Content

attributes refer to basic concepts and properties in whole numbers and integers;

fractions and decimals; elementary algebra; two-dimensional geometry, data and

basic statistics. Process attributes include attributes such as: judgmental applications of

knowledge in arithmetic and geometry; rule application in algebra; logical reasoning;

problem search; generating, visualizing and reading figures and graphs; managing of

data and procedures. Skill (item-type) attribute include attributes such as: applying

number properties and relationships (number sense); approximation/estimation;

recognizing patterns and sequences; solving open-ended items. The full list of the

attributes used in the current study appears in Appendix A.

Successful completion of each item on the test requires mastery of several attributes

that vary in number and type as a function of the content and complexity of the item.

The number of attributes per item in the Q matrix ranged from 2 to 9 with a mean of

4.6, and the number of items per attribute ranged from 2 to 18 with a mean of 8.4. A

sample of 10 representative items and the specific attributes involved in successful

completion of each of them is provided in Appendix B.

The test items were coded according to the set of 24 attributes. For data analysis,

the BILOG-MG program (Zimowski, Muraki, Mislevy, & Bock, 1996) was used to

estimate the IRT a and b parameters for the items and the BUGLIB program

(Tatsuoka, Varadi, & Tatsuoka, 1992) was used for the RS analysis.

Results

A. Quality Control Measures

The adequacy of the Q matrix as measured by regressing item difficulties on attribute

vectors as they appear in the Q matrix yielded an adjusted squared multiple

correlation of 0.85. Similarly, predicting the total test score by the attribute

probabilities for the entire sample yielded an adjusted squared multiple correlation of

Large-Scale Diagnostic Assessment 491

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 8: Large-scale diagnostic assessment: Mathematics performance in two educational systems

0.96. The respective values for the Jewish and Arab groups were 0.95 and 0.97. All

these values are considered satisfactory (Tatsuoka, in press). Another measure of the

adequacy of the Q matrix, the rate of classification by RSM, was 100%. This value

indicates the percentage of students’ response patterns that were located within the

95% probability ellipses of the latent knowledge states.

B. Group Comparisons at the Attribute Level

The results of the comparisons between the Jewish and the Arab samples at the

attribute level are presented in Table 1, which includes the mean probabilities and

standard deviations for each group on the 24 attributes along with the t values and the

effect size values (d). As can be seen in the table, 22 of the 24 attributes yielded

significant differences in favor of the Jewish group with effect sizes ranging between

0.11 and 1.00 standard deviations with a mean of 0.56. (The effect size for the

percent of correct responses on the test was 0.86.)

Four of the six content attributes yielded significant differences between the two

groups, the highest being Use of basic concepts and operations in whole numbers

(C1), and Use of fractions and decimals (C2). Setting the mastery probability at 0.8

implies that the average student in the Jewish group mastered these attributes whereas

the average student in the Arab group failed to reach mastery. The nonsignificant

differences between the two groups are on attributes involving Functions (C7) and

Data, probability and statistics (C5). The results indicate that the average student in

both groups failed to master these two attributes. On the other two content attributes,

Elementary algebra (C3) and Geometry (C4), the average student in both groups

reached mastery, yet the mean probabilities are significantly higher in the Jewish

group than in the Arab group.

As for the skill/item-type attributes, all nine of them yielded significant differences

in favor of the Jewish group. However, on only two attributes, Number properties and

relations (S2) and Approximation and estimation (S4), the means of the Jewish group

indicate mastery whereas those of the Arab group indicate non-mastery. The means

of both groups indicate mastery of four skills: Figures, tables, and graphs (S3);

Evaluation and verification of response options (S5); Comparison of entities (S9);

and Open-ended questions (S10). Yet the means of both groups indicate non-

mastery of three skills: Recognition of patterns and relationships (S6); Proportional

reasoning (S7) and Working with verbally loaded items (S11).

With respect to process attributes, all nine of them yielded significant differences in

favor of the Jewish group. On one attribute, Application of computational knowledge

(P2), the mean of the Jewish group indicates mastery whereas that of the Arab group

indicates non-mastery. On one attribute, Generalization and visualization (P7), the

means of both groups indicate mastery. Yet, the average student in both groups

exhibited non-mastery on seven process attributes: Translation (P1); Knowledge

application (P3); Application of rules in algebra (P4); Logical thinking (P5); Problem

search (P6), Data and process management (P9), and Quantitative and logical

reading (P10).

492 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 9: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Table 1. Means, Standard Deviations (SD), t values and effect size (d) values for Jewish (n = 1406)

and Arab (n = 635) 8th graders on 24 attributes

Attribute Group Mean SD t d

C1: Whole numbers Jews .81 .34 14.63�� .78

Arabs .52 .44

C2: Fractions & decimals Jews .92 .21 15.47�� .96

Arabs .68 .35

C3: elementary algebra Jews .90 .17 9.86�� .53

Arabs .80 .22

C4: Geometry Jews .93 .15 4.85�� .31

Arabs .88 .19

C5: Data, probability & statistics Jews .43 .18 1.45 –

Arabs .42 .20

C7: Functions Jews .28 .20 7.98 –

Arabs .29 .24

S2: Number properties & relationships Jews .89 .29 14.24�� .82

Arabs .61 .45

S3: Comprehend figures, tables & graphs Jews .99 .07 6.20�� .44

Arabs .95 .14

S4: Approximation & estimation. Jews .85 .23 17.54�� 1.00

Arabs .59 .34

S5: Evaluate & verify options Jews .98 .08 6.31�� .40

Arabs .94 .16

S6: Recognize patterns & relations Jews .63 .25 10.02�� .50

Arabs .50 .27

S7: Use proportional reasoning Jews .67 .29 14.40�� .70

Arabs .46 .31

S9: Compare entities Jews .98 .08 5.72�� .36

Arabs .94 .19

S10: Work with open ended items Jews .96 .14 13.88�� .89

Arabs .79 .29

S11: Work with Verbally loaded items Jews .60 .30 7.42�� .34

Arabs .50 .28

P1: Translate/formulate equations & expressions Jews .57 .33 11.50�� .50

Arabs .41 .29

P2: Apply computational knowledge Jews .88 .20 14.21�� .77

Arabs .71 .26

P3: Identify true relations Jews .59 .27 5.86�� .11

Arabs .52 .26

P4: Apply rule in Algebra Jews .68 .24 10.85�� .58

Arabs .53 .30

P5: Use logical reasoning Jews .71 .34 10.00�� .49

Arabs .54 .36

P6: Apply problem search Jews .40 .39 10.84�� .46

Arabs .23 .31

P7: Generate & read figures & Graphs Jews .94 .12 7.65� .43

Arabs .88 .20

P9: Manage information & procedures Jews .41 .35 12.49�� .55

Arabs .23 .27

P10: Quantitative & Logical reading Jews .61 .15 17.14�� .94

Arabs .46 .19

�p5.01; ��p5.001.

Large-Scale Diagnostic Assessment 493

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 10: Large-scale diagnostic assessment: Mathematics performance in two educational systems

C. Group Comparisons at Fixed Levels of Achievement

In order to find out whether the differences between the Jewish and Arab groups are

only in magnitude or also in construct, that is whether the attribute profiles of the two

groups vary also at a given achievement level, the following procedure was carried

out: The score distribution of a sample of 1,270 students comprising equal numbers

(635) of Jewish and Arab students (the Jewish subsample was randomly drawn from

the original sample of 1406 students, to match the size of the original Arab sample)

was divided into quintiles with cut-off scores of 25.01, 38.65, 52.28, and 65.92 for the

second to fifth quintiles, respectively. The numbers of Jewish and Arab students in

quintiles 1 to 5 were 45, 197; 85, 150; 145, 123; 171, 89; 189, 76; respectively. The

differences in the total test score between Jews and Arabs in each quintile were

insignificant in quintiles 1– 4. In quintile 5, a small significant difference of 2.21

points emerged in favor of the Jewish group.

Three MANOVA’s were carried out with mastery probabilities for attributes of

content, process, and skill/item-type, respectively, as dependent sets of variables and

achievement level (quintile) and group (Jews/Arabs) as independent variables.

Significant interaction effects in these three analyses would provide an indication of

structural differences in the progress of mathematical knowledge patterns between the

Jewish and Arab samples. Table 2 presents the results of these analyses in terms of

Wilks’ lambda (L) for the main effects of achievement level and group, and their

interaction. As can be seen in the table, the three effects in all analyses are significant.

Table 3 presents the means for each attribute in the Jewish and Arab groups at the five

achievement levels (Quintiles) along with tests of the significance of the differences

between the two groups in each level. As can be seen in the table, in each quintile

there are significant differences in favor of each group.

D. Clusters of Knowledge States

In order to portray the different progress patterns, maps of transitional relations

among clusters of knowledge states derived from separate cluster analyses on

Table 2. Wilks’ Lambda and F values from 2-way MANOVA’s for effects of ethnicity/culture,

test score level (quintiles), and their interaction on mastery probabilities of content, process, and

skill attributes

Effect

Content (k = 6) Process (k = 9) Skill/item type (k = 9)

Wilks � F Value Wilks � F Value Wilks � F Value

Ethnicity (1) .94 12.91��� .95 8.06��� .96 6.44���

Quintile .19 113.22��� .09 119.31��� .09 115.33���

Ethnicity X

Quintile

.94 3.19��� .94 2.06��� .95 1.95��

�P5.05; ��P5.01; ���P5.001. (1) Jews: n= 635; Arabs: n= 635. k= number of attributes.

494 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 11: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Tab

le3

.M

ean

attr

ibu

tem

aste

ryp

rob

abilit

ies

of

the

Jew

ish

and

Ara

bgro

up

sin

each

qu

inti

leo

fth

eto

tal

test

sco

red

istr

ibu

tio

n,

and

tva

lues

Att

rib

ute

Qu

inti

le1

Qu

inti

le2

Qu

inti

le3

Qu

inti

le4

Qu

inti

le5

Jew

sA

rab

sJe

ws

Ara

bs

Jew

sA

rab

sJe

ws

Ara

bs

Jew

sA

rab

s

n=

45

n=

19

7t

valu

en=

85

n=

15

0t

valu

en=

14

5n

=1

23

tva

lue

n=

17

1n=

89

tva

lue

n=

18

9n=

76

tva

lue

C1

wh

ole

nu

m.

.08

.08

7.2

9.4

8.4

97

.24

.81

.73

1.9

6.9

5.8

62

.77��

1.0

0.9

81

.83

C2

frac

tio

ns

.38

.30

2.1

8�

.81

.70

3.0

7��

.95

.90

2.3

6�

.98

.95

1.4

91

.00

.99

1.8

0

C3

elem

.al

g.

.66

.63

.79

.69

.76

72

.69��

.88

.89

7.4

9.9

7.9

7.3

91

.00

.99

.93

C4

geo

met

ry.8

7.8

5.4

6.9

5.9

02

.47�

.84

.86

7.6

2.9

2.9

1.6

3.9

9.9

9.0

4

C5

pro

b.

Sta

t..2

9.3

07

.34

.32

.41

73

.17��

.45

.45

7.2

6.4

5.5

37

3.4

2��

.49

.57

74

.01���

C7

fun

ctio

ns

.13

.15

7.4

1.2

0.2

67

2.0

6�

.34

.37

7.9

7.3

0.4

27

3.9

4���

.27

.46

75

.47���

S2

no

.p

rop

er.

.09

.10

7.3

8.7

0.6

31

.27

.96

.88

2.8

0��

.99

.97

1.0

31

.00

1.0

0

S3

fig.

tab

.grp

h.9

1.8

9.6

6.9

8.9

52

.37�

.98

.98

.35

.99

.99

7.3

5.9

91

.00

7.6

3

S4

app

roxi.

.28

.22

2.1

4�

.66

.55

3.5

0���

.90

.80

4.0

3���

.96

.89

3.1

5��

.93

.95

71

.45

S5

eval

uat

e.8

8.7

8.4

3.9

6.9

41

.82

.98

.98

.10

1.0

0.9

91

.03

.99

.98

.85

S6

pat

tern

s.3

9.3

8.2

1.4

7.4

77

.12

.49

.51

7.7

0.6

6.6

12

.07�

.80

.72

3.2

0��

S7

pro

po

rtio

n.

r..2

0.1

8.8

2.2

9.3

47

2.1

6�

.56

.56

.26

.80

.79

.52

.92

.89

1.3

9

S9

com

par

e.8

8.8

4.7

31

.00

.97

2.3

4�

.99

.98

1.0

41

.00

.99

1.3

1.9

7.9

87

1.5

4

S1

0o

pen

-en

ded

.59

.48

2.6

4��

.93

.85

3.7

4���

.99

.96

3.0

7��

1.0

0.9

9.2

61

.00

1.0

0

S1

1ve

rbal

.33

.34

7.3

6.2

8.4

07

3.8

9���

.50

.52

7.8

2.6

3.6

97

1.8

5.8

7.8

77

.17

P1

tran

slat

e.1

8.2

17

.89

.22

.28

71

.88

.39

.45

72

.11�

.64

.66

7.5

7.9

0.8

05

.24���

P2

com

pu

t..

k.4

3.4

3.0

9.6

5.6

77

1.0

8.8

6.8

31

.71

.98

.98

.98

.99

1.0

07

.50

P3

rela

tio

nsh

ips

.46

.40

1.7

1.5

2.5

0.4

6.4

6.5

17

1.8

4.5

2.5

47

.41

.82

.83

7.2

0

P4

rule

app

.al

g.2

6.2

3.9

6.4

3.5

07

2.1

8�

.67

.69

7.7

8.7

7.7

5.7

0.7

8.8

37

2.6

9��

P5

logic

alr.

.22

.25

7.8

6.3

0.4

07

2.5

6�

.56

.66

72

.74��

.85

.90

71

.90

1.0

0.9

9.8

9

P6

pro

ble

mse

ar.

.03

.06

72

.65��

.08

.11

71

.05

.16

.22

71

.75

.40

.40

7.0

8.8

2.6

93

.46��

P7

gen

er.

fig.

gr.

.83

.79

.72

.95

.90

2.5

4�

.92

.90

1.6

5.9

5.9

3.9

8.9

4.9

3.8

1

P9

man

ge.

pro

c.0

5.0

77

2.2

0�

.07

.10

71

.41

.16

.21

72

.36�

.41

.42

7.2

5.8

3.7

25

.42���

P1

0q

uan

t.re

ad.3

4.3

01

.78

.53

.46

4.5

0���

.58

.54

2.1

7�

.63

.57

3.5

9���

.70

.65

3.1

8��

� p5

.05

;��

p5

.01

;��� p

5.0

01

.

Large-Scale Diagnostic Assessment 495

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 12: Large-scale diagnostic assessment: Mathematics performance in two educational systems

students’ attribute mastery probabilities for the Jewish and Arab samples were plotted

and are presented in Figures 1 and 2. A transition from one cluster of knowledge state

to another is said to be possible whenever the set of mastered attributes associated

with the lower cluster is a proper subset of the higher connected cluster. Attributes

yielding a weight of 0.75 or larger were considered meaningful for defining a cluster

center of knowledge states in terms of mastery. Those are the attributes that appear in

the bottom part of the figures along with the number of students in each cluster and

their average test score. As can be seen in Figure 1, in the Jewish sample attributes P4

(Rule application in algebra) and P6 (Pattern recognition) divide the progressing

transitions into two paths. The transitional pattern in the Arab sample, as can be seen

in Figure 2, is more diffuse and the number of mastered attributes at the highest

cluster is 15 compared to 21 in the Jewish sample.

Discussion

The results of the current study illuminate the nature of the long-lasting gap in

mathematics achievement between Jewish and Arab students in Israel. The effect size

of the gap between the two samples in their overall test performance, as found in the

current study, is similar in magnitude to the one recently reported by Zuzovsky

(2001, p. 38), who compared the performance of Jewish and Arab eighth graders in

mathematics in the Third International Mathematics and Science Test (TIMSS-

1999). It should be noted that effect size coefficients of almost one standard deviation

were similarly reported with respect to performance in the science part of that test

(Zuzovsky, 2001, p. 61) as well as in the National Assessment Test in Science for

sixth graders (Cfir, Aviram, & Ben-Simon, 1999, p. 160).

It is difficult to disentangle the many confounding factors that explain this large

achievement gap: The complex web they create comprises differences in resources (Lavy,

1998), culture (Al-Haj, 1995), and possibly epistemological beliefs (Agmon, 2002), and

the derived conceptions of teaching and learning, as well as observed differences in

prevalent instruction practices (Birenbaum & Nasser, 2002). Rather than trying to

disentangle this complex web, a more constructive approach would be to concentrate on

what could educationally be done in order to close the gap. To meet this end, we first state

the problem and then address features of relevant instructional interventions.

The crux of the problem, as our results have shown, lies in the deficient prior

mathematical knowledge of the average Arab student, as compared to his/her Jewish

counterpart. This was indicated by non-mastery of content, skill, and process attributes

that refer to topics learned in earlier grades such as: Use of basic concepts and operations

in whole numbers (C1), and in fractions and decimals (C2); Use of prior knowledge of

number properties and relationships (number-sense) (S2); Use of approximation and

estimation (e.g., rounding off decimals or fractions in numerals, and approximate areas

or volumes in geometrical shapes) (S4), and Application of computational knowledge

(P2). Students who have mastered the latter are able to apply knowledge acquired in

earlier grades of basic terminology, concepts, and properties in arithmetic and geometry

and use calculators for basic operations. Because mastery of these attributes is

496 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 13: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Fig

.1

.A

map

of

tran

siti

on

alre

lati

on

sam

on

gcl

ust

ers

of

late

nt

kn

ow

led

ge

stat

esin

the

Jew

ish

gro

up

(N=

63

5)

Large-Scale Diagnostic Assessment 497

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 14: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Fig

.2

.A

map

of

tran

siti

on

alre

lati

on

sam

on

gcl

ust

ers

of

late

nt

kn

ow

led

ge

stat

esin

the

Ara

bgro

up

(N=

63

5)

498 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 15: Large-scale diagnostic assessment: Mathematics performance in two educational systems

fundamental to achievement in higher mathematics they should be targeted as ‘‘prime

candidates’’ for remedial interventions.

What are the features of effective interventions of this kind? It was shown that

interventions resulting in a long-lasting effect on the use of strategies that reflect

sound number sense, engaged students in mental computations, estimations, sensing

number magnitudes, moving between representation systems of numbers (such as

simple fractions, whole numbers, integers, decimals, and percentages), and judging

the reasonableness of numerical results (Markovits & Sowder, 1994). More general

instructional strategies that were shown to support conceptual understanding and

consequently procedural and conditional knowledge–that is, knowing when and why

to apply which procedure—engaged students in active learning through discussions,

conversations, and reflection (Fosnot, 1996, Sfard, 2000; Wood, 1999); inquiries and

explorations; examples; multiple solutions and monitoring strategies (Stigler &

Hiebert, 1999); collaborative learning in small groups (Davidson, 1985; Slavin,

1990); and formative assessment (Black & Wiliam, 1998; Stiggins, 2002).

In which of the two educational systems are such features more apparent in

mathematics classes? Results of a recent study that compared instructional practice

between a sample of Jewish and Arab eighth-grade mathematics classes using video

records identified several instructional features that could hinder the development of

mathematical knowledge—conceptual and conditional—in Arab classes (Birenbaum

& Nasser, 2002). In the observed classes, mathematical concepts and procedures were

mostly stated by the teacher rather than developed through examples, demonstrations,

and discussions. No strategies of how to address the problem or how to evaluate the

solution were taught. It was also noticed that students were mainly practicing routine

procedures, spent much less time on applying the procedures in new situations, and

almost never invented new procedures or coped with unfamiliar problems. Teachers

stuck to the textbook, which was the major, and frequently the sole, material used for

teaching and learning. Moreover, students were only scarcely provided with written

formative feedback regarding their homework or their performance in the very few

quizzes and tests administered in these classes. Another disturbing observation was

that nonparticipating students were mostly ignored unless they were involved in

discipline violations. It is reasonable to believe that many of these students lack basic

mathematical attributes, such as the ones identified in the current study, that should

have been carried over from earlier grades. Such teaching and assessment practices

were also shown in international comparative studies, for example TIMSS, to be

typical of low mathematics achieving countries (Stigler & Hierbert, 1997).

The results of the current study also pointed out structural differences in the

progress of mathematical knowledge between the two groups; it was found that in

each quintile some of the group differences were in favor of the Jewish students and

others in favor of the Arab students. These significant differences between the two

groups in their attribute profile at fixed levels of achievement imply that differences

may exist in the implemented math curricula in the two school systems. The maps we

presented of hierarchically ordered knowledge states for each group is another

indication of the differential progression of mathematical knowledge in the two

Large-Scale Diagnostic Assessment 499

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 16: Large-scale diagnostic assessment: Mathematics performance in two educational systems

groups but at the same time provide valuable information for instructional design.

They could be used to spur adaptive remedial interventions according to the

developmental paths depicted in each group.

Another finding worth addressing is the non-mastery of higher order mathematical

thinking skills in both samples. The average Jewish and Arab student failed to master

attributes such as: Recognition of patterns and relationships (S6); Logical thinking

(P5); Problem search (P6); Proportional reasoning (S7); Data and process manage-

ment (P9); Quantitative and logical reading (P10); and Coping with open-ended

items (S11). Studies of instructional practice in countries that excel in these

attributes, such as Japan, can help in designing effective interventions.

What are the features of such instructional practice and how do they differ form

those prevalent in Israel? TIMSS video studies have shown that a typical Japanese

lesson advances as follows: the teacher poses a complex thought-provoking question,

the students struggle with the problem, several students present ideas or solutions to

the class, the teacher leads a class discussion of the various solutions, then the teacher

summarizes the conclusions and makes connections to mathematical concepts

(Hiebert et al., 2003; Stigler, Gonzales, Kawanaka, Knoll, & Serrano, 1999). In

contrast, typical mathematics classes in Israel focus on promoting skill acquisition and

are characterized by the following sequence: The teacher explains a theorem and then

uses a sample problem to show step-by step how to apply the formula in concrete

situations; or the teacher presents a problem and demonstrates how to solve it followed

by students’ practice (Birenbaum & Nasser, 2002). Furthermore, a study that focused

on questions asked in Japanese classes during mathematics lessons revealed that

Japanese teachers tend to frequently ask higher order questions and they do so when

the class is sharing the solution methods that students generated while working at their

desks (Kawanaka & Stigler, 1999) Moreover, these researchers observed two kinds of

problem-solving activities in Japanese classrooms, which they term ‘‘divergent’’ and

‘‘convergent.’’ The former refers to open-ended problem-solving in which the students

are asked to solve a non-routine problem on their own using any method they wish or

just to think about how to solve the problem without actually solving it. The latter

refers to solving a given problem when the students know what solution method is

required. Such practice should be brought to the attention of Israeli teachers as they

reassess their practice in order to promote their students’ mathematical thinking.

Suggestions for Further Studies

Although the results of the current study indicated that the quality of the design and

classification was satisfactory it should be noted that due to its nature—a secondary

data analysis—the attributes used for the RS analysis were defined post-hoc rather than

at the stage of test design, which resulted in uneven distribution of items across the

various attributes. In order to increase the validity and reliability of future group

comparisons, it is recommended to first define a relevant set of attributes and then

write items that tap that set of attributes. Further studies should also validate students’

attribute profiles using think-aloud protocols taken as students solve the test items, and

500 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 17: Large-scale diagnostic assessment: Mathematics performance in two educational systems

compare the strategies used by students from both educational systems. It is also

recommended to conduct a similar study with respect to achievement in science – the

only other school subject where both school systems share the same official/intended

curriculum. Further research should also be directed at finding effective techniques of

reporting RS results to teachers and students and at investigating the impact of these

reports on the quality of subsequent remedial instruction.

References

Agmon, O. (2002). Beliefs of history teachers towards knowledge: Comparative study between teachers in

the Jewish and Arab sectors. Unpublished M.A. thesis, Tel Aviv University, Israel. (Hebrew).

Al-Haj, M. (1995). Education, empowerment and control: The case of the Arabs in Israel. Albany, NJ:

State University of New York Press.

Atkin, J. M., & Black, P. (1997). Policy perils of international comparisons: The TIMSS case. Phi

Delta Kappan, 79(1), 22 – 28.

Aviram, T., Cfir, R., & Ben-Simon, A. (1999). The national feedback to the educational system –

mathematics for 8th grade. Jerusalem: National Institute for Testing and Evaluation (Hebrew).

Bashi, Y., Kahan, S., & Davis, D. (1981). Achievement of the Arab elementary school in Israel

Jerusalem: The Hebrew University, School of Education. (Hebrew).

Batrice, Y. (2000). The Palestinian women in Israel: Reality and challenges: An empirical study. Acre,

Israel: Dar Alaswar. (Arabic).

Birenbaum, M., Kelly, A. E., & Tatsuoka, K. (1993). Diagnosing knowledge states in algebra using

the rule-space model. Journal for Research in Mathematics Education, 24(5), 442 – 459.

Birenbaum, M., & Nasser, F. (2002). Mathematics achievement in the Jewish and Arab sectors and their

relationships to student and teacher characteristics and educational context. Research report 99-02

(submitted to the Chief Scientist of the Israeli Ministry of Education.) Tel Aviv University,

School of Education. (Hebrew).

Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom

assessment. Phi Delta Kappan, 80(2), 139 – 148.

Buck, G., & Tatsuoka, K. K. (1998). Application of the rule-space procedure to language testing:

Examining attributes of a free response listening test. Language Testing, 15(2), 119 – 157.

Central Bureau of Statistics. (1996). Statistical abstracts of Israel, 47. Jerusalem: Central Bureau of

Statistics (Hebrew).

Cfir, R., Aviram, T., & Ben-Simon, A. (1999). The national feedback to the educational system – science

for 6th grade. Jerusalem: National Institute for Testing and Evaluation (Hebrew).

Davidson, N. (1985). Small group cooperative learning in mathematics: A selective view of

the research. In R. Slavin (Ed.), Learning to cooperate: Cooperating to learn (p. 211 – 230).

New York: Plenum.

Fosnot, C. T. (1996). Constructivism: A psychological theory of learning. In C. T. Fosnot (Ed.),

Constructivism: Theory, perspectives, and practice (pp. 8 – 33). New York: Teachers College Press.

Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., & Jacobs, J. (2003).

Teaching mathematics in seven countries: Results from the TIMSS 1999 video study. (NCES 2003 –

013). Washington DC: U.S. Department of Education, National Center for Education

Statistics.

Katz, I. R., Martinez, M. E., Sheehan, K. M., & Tatsuoka, K. K. (1998). Extending the rule space

methodology to a semantically-rich domain: Diagnostic assessment in architecture. Journal of

Educational and Behavioral Statistics, 24(3), 254 – 278.

Kawanaka, T., & Stigler, J. W. (1999). Teachers’ use of questions in eighth-grade mathematics

classrooms in Germany, Japan, and the United States. Mathematical Thinking and Learning,

1(4), 255 – 278.

Large-Scale Diagnostic Assessment 501

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 18: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Kelly, D. (2002). The TIMSS 1995 International benchmarks of mathematics and science

achievement: Profiles of world class performance at fourth and eighth grades. Educational

Research and Evaluation, 8, 41 – 54.

Kraus, V. (1988). The opportunity structure of young Israeli Arabs. In J. E. Hofman, et al.,

Arab-Jewish relations in Israel: A quest in human understanding (pp. 67 – 91). Bristol, IN:

Wyndham Hall.

Lavy, V. (1998). Disparities between Arabs and Jews in school resources and student achievement

in Israel. Economic Development and Cultural Change, 47(1), 175 – 192.

Mar’i, S. K. (1978). Arab education in Israel. New York: Syracuse University Press.

Markovits, Z., & Sowder, J. (1994). Developing number sense: An intervention study in grade 7.

Journal for Research in Mathematics Education, 25, 4 – 29.

Mullis, I. V. S., Martin, M. O., Gonzales, E. J., O’Connor, K. M., Chrostowski, S. J., Gregory,

K. D., Garden, R. A., & Smith, T. A. (2001). Mathematics benchmarking report: TIMSS – eight

grade. Achievement for U.S. States and districts in an international context. Chestnut Hill, MA:

International Study Center, Boston College.

Seginer, R., Karayanni, M., & Mar’i, M. (1990). Adolescents’ attitudes toward women’s roles.

Psychology of Women Quarterly, 14, 119 – 133.

Semyonov, M., & Lewin-Epstein, N. (1994). Ethnic labor markets, gender and socio-

economic inequality: A study of Arabs in the Israeli labor force. Sociological Quarterly,

35(1), 51 – 68.

Sfard, A. (2000). Symbolizing mathematical reality into being: How mathematical discourse and

mathematical objects create each other. In P. Cobb, K. E. Yackel, & K. McClain (Eds),

Symbolizing and communicating: Perspectives on mathematical discourse, tools, and instructional

design (pp. 37 – 98). Mahwah, NJ: Erlbaum.

Sharabi, H. (1987). Introduction to studying the Arab population. Acre, Israel: Dar Alaswar (Arabic).

Slavin, R. E. (1990). Student team learning in mathematics. In N. Davidson (Ed.), Cooperative

learning in math: A handbook for teachers (pp. 69 – 102). Boston: Allyn & Bacon.

Stiggins, R. J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta Kappan,

83(10), 758 – 765.

Stigler, J. W., Gonzales, P., Kawanaka, T., Knoll, S., & Serrano, A. (1999). The TIMSS videotape

classroom study: Methods and findings from an exploratory research project on eighth grade

mathematics instruction in Germany, Japan, and the United States. Washington, DC: National

Center for Education Statistics. (http://nces.ed.gov/timss).

Stigler, J. W., & Hiebert, J. (1997). Understanding and improving classroom mathematics

instruction: An overview of the TIMSS video study. (Third International Mathematics and

Science Study). Phi Delta Kappan, 78(1), 14 – 22.)

Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from world’s teachers for improving

education in the classroom. New York: Summit Books.

Tatsuoka, C. M., Varadi, F., & Tatsuoka, K. K. (1992). BUGLIB. Unpublished computer

program, Trenton, NJ.

Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item

response theory. Journal of Educational Measurement, 20, 34 – 38.

Tatsuoka, K. K. (1990). Toward an integration of item response theory and cognitive analysis. In

N. Frederiksen, R. Glaser, A. Lesgold, & M. C. Shafto (Eds.), Diagnostic monitoring of skill and

knowledge acquisition (pp. 543 – 588). Hillsdale, NJ: Erlbaum.

Tatsuoka, K. K. (1991). Boolean algebra applied to determination of universal set of knowledge states

Research Report ONR-1. Educational Testing Service, Princeton, NJ.

Tatsuoka, K. K. (in press). Statistical pattern recognition and classification of latent knowledge states:

Cognitively Diagnostic Assessment. Mahwah, NJ: Erlbaum.

Tatsuoka, K. K., Birenbaum, M., Lewis, C., & Sheehan, K. K. (1993). Proficiency scaling based on

conditional probability functions for attributes. (Research report 39 – 50). Princeton, NJ:

Educational Testing Service.

502 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 19: Large-scale diagnostic assessment: Mathematics performance in two educational systems

Tatsuoka, K. K., & Boodoo, G. M. (2000). Subgroup differences on GRE Quantitative test based

on the underlying cognitive processes and knowledge. In A. E. Kelly & R. A. Lesh (Eds.),

Handbook of research design in mathematics and science education (pp. 821 – 857). Mahwah, NJ:

Erlbaum.

Tatsuoka, K. K., Corter, J., & Guerrero, A. (2003). Manual of attribute-coding for general mathematics

in TIMSS studies. New York: Columbia University, Teachers College.

Tatsuoka, K. K., & Tatsuoka, M. M. (1992). A psychometrically sound cognitive diagnostic model: Effect

of remediation as empirical validity. Research Report, Educational testing Service, Princeton, NJ.

Wood, T. (1999). Creating a context for argument in mathematics class. Journal for Research in

Mathematics Education, 30, 171 – 91.

Zimowski, M. F., Muraki, E., Mislevy, R., & Bock, R. D. (1996). BILOG-MG. Chicago: Scientific

Software International.

Zuzovsky, R. (2001). Learning outcomes and the educational context of mathematics and science

teaching in Israel: Findings of the third international mathematics & science study TIMSS-

1999. Tel Aviv, Israel: Ramot. (Hebrew).

Appendix A. List of Content, Process and Skill/Item-Type Attributes1 Used in the

Current Study2

To simplify phrasing, the opening sentence for each attribute should read: ‘‘A

student who has mastered this attribute will likely be able to successfully . . . ’’

Content related attributes

C1: Use basic concepts and operations in whole numbers.

C2: Use basic concepts and operations in fractions and decimals.

C3: Use basic concepts and operations in elementary algebra.

C4: Use basic concepts and properties in geometry.

C5: Read data and use basic concepts in probability and statistics.

C7: Use basic concepts and properties in inequalities and functions.

Skill/item-type related attributes

S2: Use prior knowledge regarding number properties (number sense) and

relationships.

S3: Comprehend various representations and use them interchangeably (e.g.,

written instructions, figures, tables, charts and graphs).

S4: Use approximation/estimation.

S5: Evaluate/verify/check options in a multiple-choice item.

S6: Recognize patterns of various representations (numeric, geometric,

algebraic).

S7: Use proportional reasoning.

S9: Compare and order two or more entities.

S10: Work with open-ended items.

S11: Work with verbally loaded items.

Process-related attributes

P1: Translate/formulate equations and expressions to solve a problem.

P2: Apply computational knowledge in arithmetic, algebra and geometry.

Large-Scale Diagnostic Assessment 503

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 20: Large-scale diagnostic assessment: Mathematics performance in two educational systems

P3: Apply knowledge in arithmetic, algebra and geometry to identify true

relationships, properties and/or to set new goals in solving a problem.

P4: Apply rules in solving equations.

P5: Use logical reasoning (case reasoning, deductive thinking, generalizations).

P6: Apply problem search, analytic thinking, problem restructuring and

inductive thinking.

P7: Generate and visualize figures and graphs.

P9: Manage numerical information, procedures, goals, and conditions.

P10: Apply quantitative and logical reading.

Notes

1Adapted from Tatsuoka et. al., 2003. The attributes’ original codes as they appear there were

retained in the current study. Four of the original attributes (C6, S1, S8, and P8) were not tapped

by the NAT-M items and therefore were eliminated whereas a new content attribute (C7) was

introduced to address a topic on the NAT-M that was not covered by the TIMSS items.2For a more detailed description and examples of coded items the reader is referred to the manual

written by Tatsuoka and her colleagues (Tatsuoka et al., 2003).

Appendix B. A sample of 10 Representative Items and the Specific Attributes

Involved in Successful Completion of Each of Them

1. Place the following three numbers on the number line:

1.8, 1.2, 2.1

In order to complete this task successfully, students should master basic concepts in

fractions and decimals such as mixed numbers such as integers, fractions and

decimals (C2), know number properties such as the relationship between the two

mixed numbers (S2), comprehend the mathematical representations of real numbers

on the number line (including the meaning of order) (S3), compare the given

numbers with each other and with the numbers on the number line (S9), and place

the numbers in correct order on the real number line (P7). Failure to master any of

these attributes leads to an erroneous answer.

2. Without calculation, select the approximate value of 5.3562.8:

1. 0.15 2. 1.5 3. 15 4. 150

In order to complete this task successfully, students should master basic concepts in

mixed numbers such as integers, fractions and decimals (C2), know number

504 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 21: Large-scale diagnostic assessment: Mathematics performance in two educational systems

properties such as number of digits for the integer part of the mixed number (S2), be

able to make a correct approximation. That is: 5.35 should be rounded to 5 and 2.8

should be rounded to 3. Then students should correctly multiply 5 by 3 (P2) and

select the correct answer from the four options given in a multiple-choice item (S5).

Deficiencies in any of these attributes lead to an incorrect answer.

3. Given the two triangles ABC and FED

Also given that AB = FE; BC = ED

Complete: If RB =RY then DABC%DFED

Successful completion of this task requires students to use basic concepts and

properties related to congruent triangles such as equal corresponding sides and equal

corresponding angles (C4); to comprehend the given relations between the sides and

the angles of the two triangles and their figural representations as displayed (S3), to

build a solution on the basis of the given information (S10), and to apply their

knowledge in geometry to find the correct correspondence between the vertices (P3).

Deficiencies in one or more of these attributes lead to an incorrect answer.

4. Mark the largest of the following fractions:

1. 3/5 2. 5/10 3. 6/15 4. 11/20

In order to answer this question correctly, students should be able to use basic

concepts in and operations in fractions (C2) such as the relation between the

numerator and denominator in determining the value of the fraction, common

denominator and equivalent fractions (result from multiplying the numerator and

denominator of the fraction by the same number). Students should also be able to

apply computational knowledge in arithmetic (multiplication) on the numerators and

denominators of the fractions to equalize the denominators (P2) and to create an

appropriate basis on which they compare and order the fractions according to their

value (S9). Finally, students should be able to select the correct answer from the four

options in the multiple-choice item by comparing the numerators of the resulting

fractions with similar denominators (S5). Deficits in one or more of these attributes

cause erroneous response.

5. If t = 1 what is the value of 2(3 + t)?

In order to perform this task correctly, students should be able to use basic

concepts (such as unknown or variable) and operations (such as substitution) in

Large-Scale Diagnostic Assessment 505

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 22: Large-scale diagnostic assessment: Mathematics performance in two educational systems

elementary algebra (C3). They also should be able to apply computational knowledge

including the distributive property and the correct order of performing arithmetic

operations (P2).

6. Solve the following equation:x� 5

2¼ 6

To reach a correct solution, students should be able to use basic concepts related to

simple equations such as algebraic expression and an unknown or variable. They also

should be able to use operations such as performing the same manipulations on the

two sides to find the value of the unknown variable (C3). Students should also be able

to multiply both sides of the equation by 2 then to add 5 to both resulting sides to find

the value of x (P4). Failure to perform one or both steps leads to wrong answer.

7. One kilogram of tomatoes cost a dollar more than one kilogram of cucumbers. One

kilogram of onion cost half the price of one kilogram cucumbers. Dani bought one kilogram of

tomatoes, one kilogram of cucumbers, and two kilograms of onions and paid 10 dollars.

What is the price of one kilogram of cucumbers? Write the process of your solution.

The solution of this item requires students to build a multi-step solution for a word

problem (S10), to comprehend, extract and organize the relevant information

included in word problem (S11), to use basic concepts and operations in elementary

algebra such as using symbols to represent unknowns (C3), to use basic concepts

such as half the price and to use operations such as addition of a fractional expression

like x/2 to other expressions such as x and x + 1 (C2), translate/formulate expressions

and an equation to solve the problem (If the price of one kilogram of cucumbers is x

dollars then one kilogram of tomatoes cost x + 1 dollars and the price of 1 kilogram of

onion is x/2. The equation is x + (x + 1) + 2�x/2 = 10 more (P1), Multiplying both

sides of the equation by 2 which results in 2x + 2x + 2 + 2x = 20, summing similar

expressions which results in 6x + 2 = 20, subtracting 2 from both sides (preserve the

equality between the two sides) results in 6x = 18, then dividing both sides by 6 results

in x = 3 (P4), and use logical reasoning to check/verify the solution (P5). In this last

step students should be able to evaluate the correctness of their solution by

substituting the resulting prices in the equation x + (x + 1) + 2�x/2 = 10. That is

3 + 4 + 2�3/2 = 10 thus 10 = 10 (indicating a correct answer)

8. On one side of a fair coin there is a picture while on the other side there is a number. Rina

decided to toss the coin four times. In the first three tosses the result was the picture. What is

the probability that the coin show a picture in the fourth tossing?

1. 3/4 2. 1/2 3. 1/3 4. 1/4

This question taps the topic of basic probability and its correct solution requires the

student to work with verbally loaded items, specifically to comprehend the problem

(S11), to use basic concepts of probability such as an event (C5), use basic properties

506 M. Birenbaum et al.

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014

Page 23: Large-scale diagnostic assessment: Mathematics performance in two educational systems

of probabilistic events such as independence and apply relevant rules to solve the

problem – when events are independent the probability to get the picture remains the

same regardless of the result(s) of the previous toss(es)(P4). Deficiencies in one or

more of these attributes lead to failure to perform the task.

9. Given the following function:

f(x) = 77 x

Complete: f(Y) = 3

To complete this item correctly students should be able to use basic concepts and

operations in elementary algebra such as function (a well-behaved relationship),

variable, and substitution (C7). They also should be able to solve the simple algebraic

equation 77x = 3 by subtracting 7 from both sides of the equation which results

in – x =7 4 and then multiply both sides by – 1 which results in x = 4 (P4).

10. What property exists in all sums of any three successive numbers?

This task follows an easier one that requires students to provide four examples of

sums of three successive numbers. In order to define the property that characterizes a

set of numbers (sums of triples of successive numbers), students should be able use

basic concepts such as successive numbers and operations such as sum of three

successive numbers (C1), to use prior knowledge regarding number properties and

relationships such as the difference between pairs of successive numbers is 1 (S2), to

analyze the resulting sums and to conduct a search for common characteristics

(P6), to recognize patterns in a number set such as multiples of 3 (S6), and to use

inductive thinking to generalize from the characteristics of the individual sums to the

set of sums. Failure to demonstrate one or more of these attributes leads to incorrect

response.

Large-Scale Diagnostic Assessment 507

Dow

nloa

ded

by [

Tuf

ts U

nive

rsity

] at

14:

19 0

4 N

ovem

ber

2014