large-scale diagnostic assessment: mathematics performance in two educational systems
TRANSCRIPT
This article was downloaded by: [Tufts University]On: 04 November 2014, At: 14:19Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Educational Research and Evaluation:An International Journal on Theory andPracticePublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/nere20
Large-scale diagnostic assessment:Mathematics performance in twoeducational systemsMenucha Birenbaum a , Fadia Nasser a & Curtis Tatsuoka ba Tel Aviv University , Israelb George Washington University , Washington, DC, USAPublished online: 15 Feb 2007.
To cite this article: Menucha Birenbaum , Fadia Nasser & Curtis Tatsuoka (2005) Large-scalediagnostic assessment: Mathematics performance in two educational systems, EducationalResearch and Evaluation: An International Journal on Theory and Practice, 11:5, 487-507, DOI:10.1080/13803610500146137
To link to this article: http://dx.doi.org/10.1080/13803610500146137
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Large-Scale Diagnostic Assessment:
Mathematics performance in two
educational systems
Menucha Birenbauma, Fadia Nassera, and Curtis Tatsuokab
aTel Aviv University, Israel, and bGeorge Washington University, Washington, DC, USA
(Received 27 September 2004; accepted 7 April 2005)
A diagnostic methodology for large-scale assessment was employed to compare performance on a
national test in mathematics of representative samples of Jewish and Arab 8th graders in Israel in
order to shed light on a previously identified large achievement gap between these 2 populations.
The results revealed significant differences between the 2 groups in patterns of strengths and
weaknesses with respect to content, process, and skill/item-type attributes, indicating different paths
for remedial interventions.
Introduction
Research has pointed out a substantial discrepancy in mathematics achievement
between the Jewish and Arab populations in Israel (Aviram, Cfir, & Ben-Simon,
1999; Bashi, Kahan, & Davis, 1981; Birenbaum & Nasser, 2002; Zuzovsky, 2001),
yet the nature of this difference in terms of cognitive processes has not been
investigated thus far. In the Israeli context, the Jewish majority and the Arab minority
study under the same educational guidelines but in separate school systems with
almost no intergroup contact. The current study used a diagnostic methodology for
large-scale assessment to compare performances of representative samples of Jewish
and Arab eighth graders on a national test in mathematics. Before considering the
design of the study, a brief description of the Israeli context is provided.
The Jewish and the Arab populations in Israel represent two ethnic/cultural groups
in a conflictual relationship with little intergroup contact (Kraus, 1988). The Arab
minority constitutes approximately 20% of the population (Central Bureau of
�Corresponding author. School of Education, Tel Aviv University, Ramat Aviv 69978, Israel.
E-mail: [email protected]
Educational Research and EvaluationVol. 11, No. 5, October 2005, pp. 487 – 507
ISSN 1380-3611 (print)/ISSN 1744-4187 (online)/05/050487–21
� 2005 Taylor & Francis
DOI: 10.1080/13803610500146137
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Statistics, 1996). In contrast to Jewish Israeli society, which is by and large a typical
modern western society, Arab Israeli society is a developing society that tends to be
more traditional and conservative and maintains a clear and well-defined system of
values and customs (Batrice, 2000; Mar’i, 1978; Seginer, Karayanni, & Mar’i, 1990;
Sharabi, 1987). The Arab community in Israel is considered a non-assimilating
minority and has limited access to the opportunity structure (Al-Haj, 1995).
Consequently, the Arab minority has relatively lower standing in all aspects of
socioeconomic status (including education, occupation and income) as compared to
the Jewish majority (Al-Haj, 1995; Semyonov & Lewin-Epstein, 1994).
Although officially all government schools in Israel are open to all students, in fact
there are segregated educational systems for Arabs and Jews, both of which are run by
the State’s Ministry of Education. The languages of instruction in the Jewish and
Arab school systems are Hebrew and Arabic, respectively. Both systems share the
same official/intended curricula only in science and mathematics.
Reporting diagnostic feedback in large-scale assessments is not a common practice
but a much desired one (Atkin & Black, 1997). It can aid in interpreting test scores
and at the same time guide curricular planning and instruction so that the diagnosed
difficulties can be addressed promptly. Currently, results of national and interna-
tional tests provide interpretations to test scores (i.e., scale scores), where all test
takers who get the same scale score, or are within a prespecified range of the total
score distribution, receive the same interpretation. For instance, the diagnostic
approach in the Third International Mathematics and Science Study (TIMSS) allows
for diagnostic feedback at four benchmarks, set at the 90th, 75th, 50th, and 25th
percentiles of the international score distribution. To generate this feedback, the
performance of students whose scores were around these percentiles was examined in
terms of the educational requirements for solving anchored items that 60% of the
students in a given such group successfully answered and more than 50% of the lower
percentile group failed to answer correctly. The mastery profile for that benchmark
was specified in terms of skills judged by experts to be necessary for successfully
solving those particular items (Kelly, 2002; Mullis et al., 2001). However, there are
some inherent shortcomings to this method that preclude an accurate diagnosis on
the individual level. As noted by Kelly (2002), the benchmark descriptions must be
interpreted under the assumption that performance on the TIMSS scale is cumulative
(i.e., students reaching a particular benchmark are assumed to have acquired the
knowledge and skills described on the lower benchmark). Yet, this is not always the
case as is also implied in the other assumption, namely that performance is
continuous. Accordingly, it is recognized that students at the upper or lower ends of a
given benchmark may indeed know or understand some of the concepts that
characterize a higher benchmark, or may not know or understand some concepts that
characterize performance at a lower benchmark, respectively. A diagnostic approach
that overcomes these shortcomings is the rule space methodology (RSM) developed by
Taksuoka (1983, in press). Following is a brief account of this methodology.
RMS is used to classify examinees’ item responses according to their profile of
strengths and weaknesses on the underlying constructs measured by a test that are
488 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
termed attributes. An attribute is a description of a procedure, skill, or content
knowledge that a student must possess in order to successfully complete the target
task. Binary attribute patterns that express mastery and non-mastery of attributes are
termed knowledge states. Attributes and knowledge states are unobservable variables
that RSM transforms into observable attribute mastery probabilities.
RSM belongs to the branch in statistics that deals with pattern recognition and
classification problems, which has two stages: the design stage and the classification
stage. At the design stage, an object is characterized by its feature variables and
expressed by a pattern of feature variables. At the classification stage, the pattern is
classified into one of predetermined classification groups. However, attributes are
usually impossible to measure because they are latent. RSM has therefore extended
this approach to deal with latent feature variables. This is done by introducing an
item-by-attribute incidence matrix, referred to as Q matrix in RSM (Tatsuoka, 1990).
Every column in the Q matrix represents an attribute and every row an item. For
every item, 1s are assigned to attributes whose mastery is required for answering that
item correctly and 0s otherwise. These item-by-attribute involvement relationships
specify the hypothesized underlying constructs measured by the test. The only
assumption RSM uses at the design stage is that the right answer for a given item can
be obtained if and only if all attributes involved in that item are used correctly. Then
all possible combinations of attribute patterns from a given Q matrix are
mathematically generated by applying Boolean Algebra, and at the same time,
knowledge states are also expressed by their corresponding item score patterns
termed ideal item score patterns for differentiating them from students’ observable item
response patterns. Attribute patterns are not observable but corresponding ideal item
score patterns are observable, which form predetermined classification groups in
RSM. The classification space formulated by RSM thus contains a set of the possible
knowledge states generated from a given Q matrix (Tatsuoka, 1991). A student’s item
response pattern now can be classified into one of the predetermined groups by
applying Bayes’ decision rules that provides us with the student’s most plausible ideal
item score pattern with the membership probability.
To recapitulate, a unique characteristic of RSM is the correspondence between
attribute patterns and ideal item score patterns. This tie enables us making inference
regarding an examinee’s performance on latent attributes from his/her performance
on observable item responses. Hence, RSM transforms a dataset of students by item
scores into a dataset of students by attribute mastery probabilities, thus providing a
methodology for large-scale diagnostic assessment.
RSM has been shown to perform quite well in various areas such as subtraction of
fractions (Tatsuoka & Tatsuoka, 1992), signed numbers operations (Tatsuoka,
1990), algebra (Birenbaum, Kelly, & Tatsuoka, 1993), the quantitative parts of the
Scholastic Aptitude Test (SAT-M; Tatsuoka, Birenbaum, Lewis, & Sheehan, 1993),
and the Graduate Record Examination—GRE (Tatsuoka & Boodoo, 2000), as well
as in architecture (Katz, Martinez, Sheehan, & Tatsuoka, 1998), and listening
comprehension (Buck & Tatsuoka, 1998). Although the RSM has already been
successfully applied in several studies of mathematics performance, comparisons
Large-Scale Diagnostic Assessment 489
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
of group performances using this methodology are sparse (Tatsuoka & Boodoo,
2000).
The current study employed the RSM to diagnose students’ attribute mastery
probabilities in order to compare patterns of strengths and weaknesses in
mathematics knowledge between Jewish and Arab eighth-grade students in Israel.
Method
Participants
The research sample consisted of 2,041 eighth graders—1,406 Jewish students and
635 Arab students. This was a subsample of a representative national sample selected
for the purpose of the 1996 national assessment test in mathematics. The national
sample was stratified and included 10% of the eighth graders in that year. Participants
were all the students in the sampled classes who attended school on the day the test
was administered. Three versions of the test were randomly distributed to the entire
sample. The research sample consists of all participants who received Form A of the
test—the form that was later on disclosed to the public.
Instruments
Mathematics test. The national assessment test in mathematics (NAT-M) (Aviram
et al., 1999) is based on the formal curriculum issued by the Ministry of Education.
Senior mathematics teachers and pedagogical consultants developed the test items.
The test was translated into Arabic and reviewed by teachers and experts in
mathematics education in the Arab sector. All items were approved by the NAT
mathematics committee and were selected for inclusion in the operational version on
the basis of their psychometric properties as identified in a pilot study. Each version of
the test consisted of three parts pertaining to 12 topics. About 25% of the items
address topics studied in seventh grade. Most of the topics, which can be taught either
in eighth or in ninth grade, and topics taught at the end of the eighth grade addressed
a basic level only. The use of calculators was permitted in parts two and three of the
test.
Form A of the test included 34 items, 9 of which were in the choice response
format (multiple-choice or true-false) and the rest were in the constructed response
format. One item was a multistep investigation task. Since some of the items included
more than one section, the total number of questions in Form A of the test was 44.
Cronbach’s Alpha coefficient for the test in the entire sample was 0.91, the
reliabilities for the Jewish and Arab groups were 0.90 and 0.91, respectively.
Procedure
The NAT-M was administered in 257 classes at the end of the eighth grade. Students
were tested in class by an external examiner. Students were allowed 90 min for
490 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
answering the test items and the 11-item attitude questionnaire appended at the
end of the test. According to the NAT-M final report it took, on average, 59 min
for the Jewish group and 72.5 min for the Arab group to complete the NAT-M.
Only 10% of the students in each group failed to complete the test within the allotted
time. Another piece of information provided in the NAT-M final report indicates
that there was no consistent difference between the Jewish and Arab teachers’
reports regarding the coverage in class of the various topics included in the test
(Aviram et al., 1999).
Analysis
The set of attributes used in this study was adopted from Tatsuoka, Corter, and
Guerrero (2003) with minor modifications to fit the scope of the NAT-M test.
Tatsuoka and her colleagues developed these attributes for analyzing the 1999
TIMSS math items for grade eight. They classified the attributes into three categories
of content (C1 to C6), skills/item-type (S1 to S11) and processes (P1 to P10). Content
attributes refer to basic concepts and properties in whole numbers and integers;
fractions and decimals; elementary algebra; two-dimensional geometry, data and
basic statistics. Process attributes include attributes such as: judgmental applications of
knowledge in arithmetic and geometry; rule application in algebra; logical reasoning;
problem search; generating, visualizing and reading figures and graphs; managing of
data and procedures. Skill (item-type) attribute include attributes such as: applying
number properties and relationships (number sense); approximation/estimation;
recognizing patterns and sequences; solving open-ended items. The full list of the
attributes used in the current study appears in Appendix A.
Successful completion of each item on the test requires mastery of several attributes
that vary in number and type as a function of the content and complexity of the item.
The number of attributes per item in the Q matrix ranged from 2 to 9 with a mean of
4.6, and the number of items per attribute ranged from 2 to 18 with a mean of 8.4. A
sample of 10 representative items and the specific attributes involved in successful
completion of each of them is provided in Appendix B.
The test items were coded according to the set of 24 attributes. For data analysis,
the BILOG-MG program (Zimowski, Muraki, Mislevy, & Bock, 1996) was used to
estimate the IRT a and b parameters for the items and the BUGLIB program
(Tatsuoka, Varadi, & Tatsuoka, 1992) was used for the RS analysis.
Results
A. Quality Control Measures
The adequacy of the Q matrix as measured by regressing item difficulties on attribute
vectors as they appear in the Q matrix yielded an adjusted squared multiple
correlation of 0.85. Similarly, predicting the total test score by the attribute
probabilities for the entire sample yielded an adjusted squared multiple correlation of
Large-Scale Diagnostic Assessment 491
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
0.96. The respective values for the Jewish and Arab groups were 0.95 and 0.97. All
these values are considered satisfactory (Tatsuoka, in press). Another measure of the
adequacy of the Q matrix, the rate of classification by RSM, was 100%. This value
indicates the percentage of students’ response patterns that were located within the
95% probability ellipses of the latent knowledge states.
B. Group Comparisons at the Attribute Level
The results of the comparisons between the Jewish and the Arab samples at the
attribute level are presented in Table 1, which includes the mean probabilities and
standard deviations for each group on the 24 attributes along with the t values and the
effect size values (d). As can be seen in the table, 22 of the 24 attributes yielded
significant differences in favor of the Jewish group with effect sizes ranging between
0.11 and 1.00 standard deviations with a mean of 0.56. (The effect size for the
percent of correct responses on the test was 0.86.)
Four of the six content attributes yielded significant differences between the two
groups, the highest being Use of basic concepts and operations in whole numbers
(C1), and Use of fractions and decimals (C2). Setting the mastery probability at 0.8
implies that the average student in the Jewish group mastered these attributes whereas
the average student in the Arab group failed to reach mastery. The nonsignificant
differences between the two groups are on attributes involving Functions (C7) and
Data, probability and statistics (C5). The results indicate that the average student in
both groups failed to master these two attributes. On the other two content attributes,
Elementary algebra (C3) and Geometry (C4), the average student in both groups
reached mastery, yet the mean probabilities are significantly higher in the Jewish
group than in the Arab group.
As for the skill/item-type attributes, all nine of them yielded significant differences
in favor of the Jewish group. However, on only two attributes, Number properties and
relations (S2) and Approximation and estimation (S4), the means of the Jewish group
indicate mastery whereas those of the Arab group indicate non-mastery. The means
of both groups indicate mastery of four skills: Figures, tables, and graphs (S3);
Evaluation and verification of response options (S5); Comparison of entities (S9);
and Open-ended questions (S10). Yet the means of both groups indicate non-
mastery of three skills: Recognition of patterns and relationships (S6); Proportional
reasoning (S7) and Working with verbally loaded items (S11).
With respect to process attributes, all nine of them yielded significant differences in
favor of the Jewish group. On one attribute, Application of computational knowledge
(P2), the mean of the Jewish group indicates mastery whereas that of the Arab group
indicates non-mastery. On one attribute, Generalization and visualization (P7), the
means of both groups indicate mastery. Yet, the average student in both groups
exhibited non-mastery on seven process attributes: Translation (P1); Knowledge
application (P3); Application of rules in algebra (P4); Logical thinking (P5); Problem
search (P6), Data and process management (P9), and Quantitative and logical
reading (P10).
492 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Table 1. Means, Standard Deviations (SD), t values and effect size (d) values for Jewish (n = 1406)
and Arab (n = 635) 8th graders on 24 attributes
Attribute Group Mean SD t d
C1: Whole numbers Jews .81 .34 14.63�� .78
Arabs .52 .44
C2: Fractions & decimals Jews .92 .21 15.47�� .96
Arabs .68 .35
C3: elementary algebra Jews .90 .17 9.86�� .53
Arabs .80 .22
C4: Geometry Jews .93 .15 4.85�� .31
Arabs .88 .19
C5: Data, probability & statistics Jews .43 .18 1.45 –
Arabs .42 .20
C7: Functions Jews .28 .20 7.98 –
Arabs .29 .24
S2: Number properties & relationships Jews .89 .29 14.24�� .82
Arabs .61 .45
S3: Comprehend figures, tables & graphs Jews .99 .07 6.20�� .44
Arabs .95 .14
S4: Approximation & estimation. Jews .85 .23 17.54�� 1.00
Arabs .59 .34
S5: Evaluate & verify options Jews .98 .08 6.31�� .40
Arabs .94 .16
S6: Recognize patterns & relations Jews .63 .25 10.02�� .50
Arabs .50 .27
S7: Use proportional reasoning Jews .67 .29 14.40�� .70
Arabs .46 .31
S9: Compare entities Jews .98 .08 5.72�� .36
Arabs .94 .19
S10: Work with open ended items Jews .96 .14 13.88�� .89
Arabs .79 .29
S11: Work with Verbally loaded items Jews .60 .30 7.42�� .34
Arabs .50 .28
P1: Translate/formulate equations & expressions Jews .57 .33 11.50�� .50
Arabs .41 .29
P2: Apply computational knowledge Jews .88 .20 14.21�� .77
Arabs .71 .26
P3: Identify true relations Jews .59 .27 5.86�� .11
Arabs .52 .26
P4: Apply rule in Algebra Jews .68 .24 10.85�� .58
Arabs .53 .30
P5: Use logical reasoning Jews .71 .34 10.00�� .49
Arabs .54 .36
P6: Apply problem search Jews .40 .39 10.84�� .46
Arabs .23 .31
P7: Generate & read figures & Graphs Jews .94 .12 7.65� .43
Arabs .88 .20
P9: Manage information & procedures Jews .41 .35 12.49�� .55
Arabs .23 .27
P10: Quantitative & Logical reading Jews .61 .15 17.14�� .94
Arabs .46 .19
�p5.01; ��p5.001.
Large-Scale Diagnostic Assessment 493
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
C. Group Comparisons at Fixed Levels of Achievement
In order to find out whether the differences between the Jewish and Arab groups are
only in magnitude or also in construct, that is whether the attribute profiles of the two
groups vary also at a given achievement level, the following procedure was carried
out: The score distribution of a sample of 1,270 students comprising equal numbers
(635) of Jewish and Arab students (the Jewish subsample was randomly drawn from
the original sample of 1406 students, to match the size of the original Arab sample)
was divided into quintiles with cut-off scores of 25.01, 38.65, 52.28, and 65.92 for the
second to fifth quintiles, respectively. The numbers of Jewish and Arab students in
quintiles 1 to 5 were 45, 197; 85, 150; 145, 123; 171, 89; 189, 76; respectively. The
differences in the total test score between Jews and Arabs in each quintile were
insignificant in quintiles 1– 4. In quintile 5, a small significant difference of 2.21
points emerged in favor of the Jewish group.
Three MANOVA’s were carried out with mastery probabilities for attributes of
content, process, and skill/item-type, respectively, as dependent sets of variables and
achievement level (quintile) and group (Jews/Arabs) as independent variables.
Significant interaction effects in these three analyses would provide an indication of
structural differences in the progress of mathematical knowledge patterns between the
Jewish and Arab samples. Table 2 presents the results of these analyses in terms of
Wilks’ lambda (L) for the main effects of achievement level and group, and their
interaction. As can be seen in the table, the three effects in all analyses are significant.
Table 3 presents the means for each attribute in the Jewish and Arab groups at the five
achievement levels (Quintiles) along with tests of the significance of the differences
between the two groups in each level. As can be seen in the table, in each quintile
there are significant differences in favor of each group.
D. Clusters of Knowledge States
In order to portray the different progress patterns, maps of transitional relations
among clusters of knowledge states derived from separate cluster analyses on
Table 2. Wilks’ Lambda and F values from 2-way MANOVA’s for effects of ethnicity/culture,
test score level (quintiles), and their interaction on mastery probabilities of content, process, and
skill attributes
Effect
Content (k = 6) Process (k = 9) Skill/item type (k = 9)
Wilks � F Value Wilks � F Value Wilks � F Value
Ethnicity (1) .94 12.91��� .95 8.06��� .96 6.44���
Quintile .19 113.22��� .09 119.31��� .09 115.33���
Ethnicity X
Quintile
.94 3.19��� .94 2.06��� .95 1.95��
�P5.05; ��P5.01; ���P5.001. (1) Jews: n= 635; Arabs: n= 635. k= number of attributes.
494 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Tab
le3
.M
ean
attr
ibu
tem
aste
ryp
rob
abilit
ies
of
the
Jew
ish
and
Ara
bgro
up
sin
each
qu
inti
leo
fth
eto
tal
test
sco
red
istr
ibu
tio
n,
and
tva
lues
Att
rib
ute
Qu
inti
le1
Qu
inti
le2
Qu
inti
le3
Qu
inti
le4
Qu
inti
le5
Jew
sA
rab
sJe
ws
Ara
bs
Jew
sA
rab
sJe
ws
Ara
bs
Jew
sA
rab
s
n=
45
n=
19
7t
valu
en=
85
n=
15
0t
valu
en=
14
5n
=1
23
tva
lue
n=
17
1n=
89
tva
lue
n=
18
9n=
76
tva
lue
C1
wh
ole
nu
m.
.08
.08
7.2
9.4
8.4
97
.24
.81
.73
1.9
6.9
5.8
62
.77��
1.0
0.9
81
.83
C2
frac
tio
ns
.38
.30
2.1
8�
.81
.70
3.0
7��
.95
.90
2.3
6�
.98
.95
1.4
91
.00
.99
1.8
0
C3
elem
.al
g.
.66
.63
.79
.69
.76
72
.69��
.88
.89
7.4
9.9
7.9
7.3
91
.00
.99
.93
C4
geo
met
ry.8
7.8
5.4
6.9
5.9
02
.47�
.84
.86
7.6
2.9
2.9
1.6
3.9
9.9
9.0
4
C5
pro
b.
Sta
t..2
9.3
07
.34
.32
.41
73
.17��
.45
.45
7.2
6.4
5.5
37
3.4
2��
.49
.57
74
.01���
C7
fun
ctio
ns
.13
.15
7.4
1.2
0.2
67
2.0
6�
.34
.37
7.9
7.3
0.4
27
3.9
4���
.27
.46
75
.47���
S2
no
.p
rop
er.
.09
.10
7.3
8.7
0.6
31
.27
.96
.88
2.8
0��
.99
.97
1.0
31
.00
1.0
0
S3
fig.
tab
.grp
h.9
1.8
9.6
6.9
8.9
52
.37�
.98
.98
.35
.99
.99
7.3
5.9
91
.00
7.6
3
S4
app
roxi.
.28
.22
2.1
4�
.66
.55
3.5
0���
.90
.80
4.0
3���
.96
.89
3.1
5��
.93
.95
71
.45
S5
eval
uat
e.8
8.7
8.4
3.9
6.9
41
.82
.98
.98
.10
1.0
0.9
91
.03
.99
.98
.85
S6
pat
tern
s.3
9.3
8.2
1.4
7.4
77
.12
.49
.51
7.7
0.6
6.6
12
.07�
.80
.72
3.2
0��
S7
pro
po
rtio
n.
r..2
0.1
8.8
2.2
9.3
47
2.1
6�
.56
.56
.26
.80
.79
.52
.92
.89
1.3
9
S9
com
par
e.8
8.8
4.7
31
.00
.97
2.3
4�
.99
.98
1.0
41
.00
.99
1.3
1.9
7.9
87
1.5
4
S1
0o
pen
-en
ded
.59
.48
2.6
4��
.93
.85
3.7
4���
.99
.96
3.0
7��
1.0
0.9
9.2
61
.00
1.0
0
S1
1ve
rbal
.33
.34
7.3
6.2
8.4
07
3.8
9���
.50
.52
7.8
2.6
3.6
97
1.8
5.8
7.8
77
.17
P1
tran
slat
e.1
8.2
17
.89
.22
.28
71
.88
.39
.45
72
.11�
.64
.66
7.5
7.9
0.8
05
.24���
P2
com
pu
t..
k.4
3.4
3.0
9.6
5.6
77
1.0
8.8
6.8
31
.71
.98
.98
.98
.99
1.0
07
.50
P3
rela
tio
nsh
ips
.46
.40
1.7
1.5
2.5
0.4
6.4
6.5
17
1.8
4.5
2.5
47
.41
.82
.83
7.2
0
P4
rule
app
.al
g.2
6.2
3.9
6.4
3.5
07
2.1
8�
.67
.69
7.7
8.7
7.7
5.7
0.7
8.8
37
2.6
9��
P5
logic
alr.
.22
.25
7.8
6.3
0.4
07
2.5
6�
.56
.66
72
.74��
.85
.90
71
.90
1.0
0.9
9.8
9
P6
pro
ble
mse
ar.
.03
.06
72
.65��
.08
.11
71
.05
.16
.22
71
.75
.40
.40
7.0
8.8
2.6
93
.46��
P7
gen
er.
fig.
gr.
.83
.79
.72
.95
.90
2.5
4�
.92
.90
1.6
5.9
5.9
3.9
8.9
4.9
3.8
1
P9
man
ge.
pro
c.0
5.0
77
2.2
0�
.07
.10
71
.41
.16
.21
72
.36�
.41
.42
7.2
5.8
3.7
25
.42���
P1
0q
uan
t.re
ad.3
4.3
01
.78
.53
.46
4.5
0���
.58
.54
2.1
7�
.63
.57
3.5
9���
.70
.65
3.1
8��
� p5
.05
;��
p5
.01
;��� p
5.0
01
.
Large-Scale Diagnostic Assessment 495
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
students’ attribute mastery probabilities for the Jewish and Arab samples were plotted
and are presented in Figures 1 and 2. A transition from one cluster of knowledge state
to another is said to be possible whenever the set of mastered attributes associated
with the lower cluster is a proper subset of the higher connected cluster. Attributes
yielding a weight of 0.75 or larger were considered meaningful for defining a cluster
center of knowledge states in terms of mastery. Those are the attributes that appear in
the bottom part of the figures along with the number of students in each cluster and
their average test score. As can be seen in Figure 1, in the Jewish sample attributes P4
(Rule application in algebra) and P6 (Pattern recognition) divide the progressing
transitions into two paths. The transitional pattern in the Arab sample, as can be seen
in Figure 2, is more diffuse and the number of mastered attributes at the highest
cluster is 15 compared to 21 in the Jewish sample.
Discussion
The results of the current study illuminate the nature of the long-lasting gap in
mathematics achievement between Jewish and Arab students in Israel. The effect size
of the gap between the two samples in their overall test performance, as found in the
current study, is similar in magnitude to the one recently reported by Zuzovsky
(2001, p. 38), who compared the performance of Jewish and Arab eighth graders in
mathematics in the Third International Mathematics and Science Test (TIMSS-
1999). It should be noted that effect size coefficients of almost one standard deviation
were similarly reported with respect to performance in the science part of that test
(Zuzovsky, 2001, p. 61) as well as in the National Assessment Test in Science for
sixth graders (Cfir, Aviram, & Ben-Simon, 1999, p. 160).
It is difficult to disentangle the many confounding factors that explain this large
achievement gap: The complex web they create comprises differences in resources (Lavy,
1998), culture (Al-Haj, 1995), and possibly epistemological beliefs (Agmon, 2002), and
the derived conceptions of teaching and learning, as well as observed differences in
prevalent instruction practices (Birenbaum & Nasser, 2002). Rather than trying to
disentangle this complex web, a more constructive approach would be to concentrate on
what could educationally be done in order to close the gap. To meet this end, we first state
the problem and then address features of relevant instructional interventions.
The crux of the problem, as our results have shown, lies in the deficient prior
mathematical knowledge of the average Arab student, as compared to his/her Jewish
counterpart. This was indicated by non-mastery of content, skill, and process attributes
that refer to topics learned in earlier grades such as: Use of basic concepts and operations
in whole numbers (C1), and in fractions and decimals (C2); Use of prior knowledge of
number properties and relationships (number-sense) (S2); Use of approximation and
estimation (e.g., rounding off decimals or fractions in numerals, and approximate areas
or volumes in geometrical shapes) (S4), and Application of computational knowledge
(P2). Students who have mastered the latter are able to apply knowledge acquired in
earlier grades of basic terminology, concepts, and properties in arithmetic and geometry
and use calculators for basic operations. Because mastery of these attributes is
496 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Fig
.1
.A
map
of
tran
siti
on
alre
lati
on
sam
on
gcl
ust
ers
of
late
nt
kn
ow
led
ge
stat
esin
the
Jew
ish
gro
up
(N=
63
5)
Large-Scale Diagnostic Assessment 497
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Fig
.2
.A
map
of
tran
siti
on
alre
lati
on
sam
on
gcl
ust
ers
of
late
nt
kn
ow
led
ge
stat
esin
the
Ara
bgro
up
(N=
63
5)
498 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
fundamental to achievement in higher mathematics they should be targeted as ‘‘prime
candidates’’ for remedial interventions.
What are the features of effective interventions of this kind? It was shown that
interventions resulting in a long-lasting effect on the use of strategies that reflect
sound number sense, engaged students in mental computations, estimations, sensing
number magnitudes, moving between representation systems of numbers (such as
simple fractions, whole numbers, integers, decimals, and percentages), and judging
the reasonableness of numerical results (Markovits & Sowder, 1994). More general
instructional strategies that were shown to support conceptual understanding and
consequently procedural and conditional knowledge–that is, knowing when and why
to apply which procedure—engaged students in active learning through discussions,
conversations, and reflection (Fosnot, 1996, Sfard, 2000; Wood, 1999); inquiries and
explorations; examples; multiple solutions and monitoring strategies (Stigler &
Hiebert, 1999); collaborative learning in small groups (Davidson, 1985; Slavin,
1990); and formative assessment (Black & Wiliam, 1998; Stiggins, 2002).
In which of the two educational systems are such features more apparent in
mathematics classes? Results of a recent study that compared instructional practice
between a sample of Jewish and Arab eighth-grade mathematics classes using video
records identified several instructional features that could hinder the development of
mathematical knowledge—conceptual and conditional—in Arab classes (Birenbaum
& Nasser, 2002). In the observed classes, mathematical concepts and procedures were
mostly stated by the teacher rather than developed through examples, demonstrations,
and discussions. No strategies of how to address the problem or how to evaluate the
solution were taught. It was also noticed that students were mainly practicing routine
procedures, spent much less time on applying the procedures in new situations, and
almost never invented new procedures or coped with unfamiliar problems. Teachers
stuck to the textbook, which was the major, and frequently the sole, material used for
teaching and learning. Moreover, students were only scarcely provided with written
formative feedback regarding their homework or their performance in the very few
quizzes and tests administered in these classes. Another disturbing observation was
that nonparticipating students were mostly ignored unless they were involved in
discipline violations. It is reasonable to believe that many of these students lack basic
mathematical attributes, such as the ones identified in the current study, that should
have been carried over from earlier grades. Such teaching and assessment practices
were also shown in international comparative studies, for example TIMSS, to be
typical of low mathematics achieving countries (Stigler & Hierbert, 1997).
The results of the current study also pointed out structural differences in the
progress of mathematical knowledge between the two groups; it was found that in
each quintile some of the group differences were in favor of the Jewish students and
others in favor of the Arab students. These significant differences between the two
groups in their attribute profile at fixed levels of achievement imply that differences
may exist in the implemented math curricula in the two school systems. The maps we
presented of hierarchically ordered knowledge states for each group is another
indication of the differential progression of mathematical knowledge in the two
Large-Scale Diagnostic Assessment 499
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
groups but at the same time provide valuable information for instructional design.
They could be used to spur adaptive remedial interventions according to the
developmental paths depicted in each group.
Another finding worth addressing is the non-mastery of higher order mathematical
thinking skills in both samples. The average Jewish and Arab student failed to master
attributes such as: Recognition of patterns and relationships (S6); Logical thinking
(P5); Problem search (P6); Proportional reasoning (S7); Data and process manage-
ment (P9); Quantitative and logical reading (P10); and Coping with open-ended
items (S11). Studies of instructional practice in countries that excel in these
attributes, such as Japan, can help in designing effective interventions.
What are the features of such instructional practice and how do they differ form
those prevalent in Israel? TIMSS video studies have shown that a typical Japanese
lesson advances as follows: the teacher poses a complex thought-provoking question,
the students struggle with the problem, several students present ideas or solutions to
the class, the teacher leads a class discussion of the various solutions, then the teacher
summarizes the conclusions and makes connections to mathematical concepts
(Hiebert et al., 2003; Stigler, Gonzales, Kawanaka, Knoll, & Serrano, 1999). In
contrast, typical mathematics classes in Israel focus on promoting skill acquisition and
are characterized by the following sequence: The teacher explains a theorem and then
uses a sample problem to show step-by step how to apply the formula in concrete
situations; or the teacher presents a problem and demonstrates how to solve it followed
by students’ practice (Birenbaum & Nasser, 2002). Furthermore, a study that focused
on questions asked in Japanese classes during mathematics lessons revealed that
Japanese teachers tend to frequently ask higher order questions and they do so when
the class is sharing the solution methods that students generated while working at their
desks (Kawanaka & Stigler, 1999) Moreover, these researchers observed two kinds of
problem-solving activities in Japanese classrooms, which they term ‘‘divergent’’ and
‘‘convergent.’’ The former refers to open-ended problem-solving in which the students
are asked to solve a non-routine problem on their own using any method they wish or
just to think about how to solve the problem without actually solving it. The latter
refers to solving a given problem when the students know what solution method is
required. Such practice should be brought to the attention of Israeli teachers as they
reassess their practice in order to promote their students’ mathematical thinking.
Suggestions for Further Studies
Although the results of the current study indicated that the quality of the design and
classification was satisfactory it should be noted that due to its nature—a secondary
data analysis—the attributes used for the RS analysis were defined post-hoc rather than
at the stage of test design, which resulted in uneven distribution of items across the
various attributes. In order to increase the validity and reliability of future group
comparisons, it is recommended to first define a relevant set of attributes and then
write items that tap that set of attributes. Further studies should also validate students’
attribute profiles using think-aloud protocols taken as students solve the test items, and
500 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
compare the strategies used by students from both educational systems. It is also
recommended to conduct a similar study with respect to achievement in science – the
only other school subject where both school systems share the same official/intended
curriculum. Further research should also be directed at finding effective techniques of
reporting RS results to teachers and students and at investigating the impact of these
reports on the quality of subsequent remedial instruction.
References
Agmon, O. (2002). Beliefs of history teachers towards knowledge: Comparative study between teachers in
the Jewish and Arab sectors. Unpublished M.A. thesis, Tel Aviv University, Israel. (Hebrew).
Al-Haj, M. (1995). Education, empowerment and control: The case of the Arabs in Israel. Albany, NJ:
State University of New York Press.
Atkin, J. M., & Black, P. (1997). Policy perils of international comparisons: The TIMSS case. Phi
Delta Kappan, 79(1), 22 – 28.
Aviram, T., Cfir, R., & Ben-Simon, A. (1999). The national feedback to the educational system –
mathematics for 8th grade. Jerusalem: National Institute for Testing and Evaluation (Hebrew).
Bashi, Y., Kahan, S., & Davis, D. (1981). Achievement of the Arab elementary school in Israel
Jerusalem: The Hebrew University, School of Education. (Hebrew).
Batrice, Y. (2000). The Palestinian women in Israel: Reality and challenges: An empirical study. Acre,
Israel: Dar Alaswar. (Arabic).
Birenbaum, M., Kelly, A. E., & Tatsuoka, K. (1993). Diagnosing knowledge states in algebra using
the rule-space model. Journal for Research in Mathematics Education, 24(5), 442 – 459.
Birenbaum, M., & Nasser, F. (2002). Mathematics achievement in the Jewish and Arab sectors and their
relationships to student and teacher characteristics and educational context. Research report 99-02
(submitted to the Chief Scientist of the Israeli Ministry of Education.) Tel Aviv University,
School of Education. (Hebrew).
Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom
assessment. Phi Delta Kappan, 80(2), 139 – 148.
Buck, G., & Tatsuoka, K. K. (1998). Application of the rule-space procedure to language testing:
Examining attributes of a free response listening test. Language Testing, 15(2), 119 – 157.
Central Bureau of Statistics. (1996). Statistical abstracts of Israel, 47. Jerusalem: Central Bureau of
Statistics (Hebrew).
Cfir, R., Aviram, T., & Ben-Simon, A. (1999). The national feedback to the educational system – science
for 6th grade. Jerusalem: National Institute for Testing and Evaluation (Hebrew).
Davidson, N. (1985). Small group cooperative learning in mathematics: A selective view of
the research. In R. Slavin (Ed.), Learning to cooperate: Cooperating to learn (p. 211 – 230).
New York: Plenum.
Fosnot, C. T. (1996). Constructivism: A psychological theory of learning. In C. T. Fosnot (Ed.),
Constructivism: Theory, perspectives, and practice (pp. 8 – 33). New York: Teachers College Press.
Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., & Jacobs, J. (2003).
Teaching mathematics in seven countries: Results from the TIMSS 1999 video study. (NCES 2003 –
013). Washington DC: U.S. Department of Education, National Center for Education
Statistics.
Katz, I. R., Martinez, M. E., Sheehan, K. M., & Tatsuoka, K. K. (1998). Extending the rule space
methodology to a semantically-rich domain: Diagnostic assessment in architecture. Journal of
Educational and Behavioral Statistics, 24(3), 254 – 278.
Kawanaka, T., & Stigler, J. W. (1999). Teachers’ use of questions in eighth-grade mathematics
classrooms in Germany, Japan, and the United States. Mathematical Thinking and Learning,
1(4), 255 – 278.
Large-Scale Diagnostic Assessment 501
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Kelly, D. (2002). The TIMSS 1995 International benchmarks of mathematics and science
achievement: Profiles of world class performance at fourth and eighth grades. Educational
Research and Evaluation, 8, 41 – 54.
Kraus, V. (1988). The opportunity structure of young Israeli Arabs. In J. E. Hofman, et al.,
Arab-Jewish relations in Israel: A quest in human understanding (pp. 67 – 91). Bristol, IN:
Wyndham Hall.
Lavy, V. (1998). Disparities between Arabs and Jews in school resources and student achievement
in Israel. Economic Development and Cultural Change, 47(1), 175 – 192.
Mar’i, S. K. (1978). Arab education in Israel. New York: Syracuse University Press.
Markovits, Z., & Sowder, J. (1994). Developing number sense: An intervention study in grade 7.
Journal for Research in Mathematics Education, 25, 4 – 29.
Mullis, I. V. S., Martin, M. O., Gonzales, E. J., O’Connor, K. M., Chrostowski, S. J., Gregory,
K. D., Garden, R. A., & Smith, T. A. (2001). Mathematics benchmarking report: TIMSS – eight
grade. Achievement for U.S. States and districts in an international context. Chestnut Hill, MA:
International Study Center, Boston College.
Seginer, R., Karayanni, M., & Mar’i, M. (1990). Adolescents’ attitudes toward women’s roles.
Psychology of Women Quarterly, 14, 119 – 133.
Semyonov, M., & Lewin-Epstein, N. (1994). Ethnic labor markets, gender and socio-
economic inequality: A study of Arabs in the Israeli labor force. Sociological Quarterly,
35(1), 51 – 68.
Sfard, A. (2000). Symbolizing mathematical reality into being: How mathematical discourse and
mathematical objects create each other. In P. Cobb, K. E. Yackel, & K. McClain (Eds),
Symbolizing and communicating: Perspectives on mathematical discourse, tools, and instructional
design (pp. 37 – 98). Mahwah, NJ: Erlbaum.
Sharabi, H. (1987). Introduction to studying the Arab population. Acre, Israel: Dar Alaswar (Arabic).
Slavin, R. E. (1990). Student team learning in mathematics. In N. Davidson (Ed.), Cooperative
learning in math: A handbook for teachers (pp. 69 – 102). Boston: Allyn & Bacon.
Stiggins, R. J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta Kappan,
83(10), 758 – 765.
Stigler, J. W., Gonzales, P., Kawanaka, T., Knoll, S., & Serrano, A. (1999). The TIMSS videotape
classroom study: Methods and findings from an exploratory research project on eighth grade
mathematics instruction in Germany, Japan, and the United States. Washington, DC: National
Center for Education Statistics. (http://nces.ed.gov/timss).
Stigler, J. W., & Hiebert, J. (1997). Understanding and improving classroom mathematics
instruction: An overview of the TIMSS video study. (Third International Mathematics and
Science Study). Phi Delta Kappan, 78(1), 14 – 22.)
Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from world’s teachers for improving
education in the classroom. New York: Summit Books.
Tatsuoka, C. M., Varadi, F., & Tatsuoka, K. K. (1992). BUGLIB. Unpublished computer
program, Trenton, NJ.
Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item
response theory. Journal of Educational Measurement, 20, 34 – 38.
Tatsuoka, K. K. (1990). Toward an integration of item response theory and cognitive analysis. In
N. Frederiksen, R. Glaser, A. Lesgold, & M. C. Shafto (Eds.), Diagnostic monitoring of skill and
knowledge acquisition (pp. 543 – 588). Hillsdale, NJ: Erlbaum.
Tatsuoka, K. K. (1991). Boolean algebra applied to determination of universal set of knowledge states
Research Report ONR-1. Educational Testing Service, Princeton, NJ.
Tatsuoka, K. K. (in press). Statistical pattern recognition and classification of latent knowledge states:
Cognitively Diagnostic Assessment. Mahwah, NJ: Erlbaum.
Tatsuoka, K. K., Birenbaum, M., Lewis, C., & Sheehan, K. K. (1993). Proficiency scaling based on
conditional probability functions for attributes. (Research report 39 – 50). Princeton, NJ:
Educational Testing Service.
502 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
Tatsuoka, K. K., & Boodoo, G. M. (2000). Subgroup differences on GRE Quantitative test based
on the underlying cognitive processes and knowledge. In A. E. Kelly & R. A. Lesh (Eds.),
Handbook of research design in mathematics and science education (pp. 821 – 857). Mahwah, NJ:
Erlbaum.
Tatsuoka, K. K., Corter, J., & Guerrero, A. (2003). Manual of attribute-coding for general mathematics
in TIMSS studies. New York: Columbia University, Teachers College.
Tatsuoka, K. K., & Tatsuoka, M. M. (1992). A psychometrically sound cognitive diagnostic model: Effect
of remediation as empirical validity. Research Report, Educational testing Service, Princeton, NJ.
Wood, T. (1999). Creating a context for argument in mathematics class. Journal for Research in
Mathematics Education, 30, 171 – 91.
Zimowski, M. F., Muraki, E., Mislevy, R., & Bock, R. D. (1996). BILOG-MG. Chicago: Scientific
Software International.
Zuzovsky, R. (2001). Learning outcomes and the educational context of mathematics and science
teaching in Israel: Findings of the third international mathematics & science study TIMSS-
1999. Tel Aviv, Israel: Ramot. (Hebrew).
Appendix A. List of Content, Process and Skill/Item-Type Attributes1 Used in the
Current Study2
To simplify phrasing, the opening sentence for each attribute should read: ‘‘A
student who has mastered this attribute will likely be able to successfully . . . ’’
Content related attributes
C1: Use basic concepts and operations in whole numbers.
C2: Use basic concepts and operations in fractions and decimals.
C3: Use basic concepts and operations in elementary algebra.
C4: Use basic concepts and properties in geometry.
C5: Read data and use basic concepts in probability and statistics.
C7: Use basic concepts and properties in inequalities and functions.
Skill/item-type related attributes
S2: Use prior knowledge regarding number properties (number sense) and
relationships.
S3: Comprehend various representations and use them interchangeably (e.g.,
written instructions, figures, tables, charts and graphs).
S4: Use approximation/estimation.
S5: Evaluate/verify/check options in a multiple-choice item.
S6: Recognize patterns of various representations (numeric, geometric,
algebraic).
S7: Use proportional reasoning.
S9: Compare and order two or more entities.
S10: Work with open-ended items.
S11: Work with verbally loaded items.
Process-related attributes
P1: Translate/formulate equations and expressions to solve a problem.
P2: Apply computational knowledge in arithmetic, algebra and geometry.
Large-Scale Diagnostic Assessment 503
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
P3: Apply knowledge in arithmetic, algebra and geometry to identify true
relationships, properties and/or to set new goals in solving a problem.
P4: Apply rules in solving equations.
P5: Use logical reasoning (case reasoning, deductive thinking, generalizations).
P6: Apply problem search, analytic thinking, problem restructuring and
inductive thinking.
P7: Generate and visualize figures and graphs.
P9: Manage numerical information, procedures, goals, and conditions.
P10: Apply quantitative and logical reading.
Notes
1Adapted from Tatsuoka et. al., 2003. The attributes’ original codes as they appear there were
retained in the current study. Four of the original attributes (C6, S1, S8, and P8) were not tapped
by the NAT-M items and therefore were eliminated whereas a new content attribute (C7) was
introduced to address a topic on the NAT-M that was not covered by the TIMSS items.2For a more detailed description and examples of coded items the reader is referred to the manual
written by Tatsuoka and her colleagues (Tatsuoka et al., 2003).
Appendix B. A sample of 10 Representative Items and the Specific Attributes
Involved in Successful Completion of Each of Them
1. Place the following three numbers on the number line:
1.8, 1.2, 2.1
In order to complete this task successfully, students should master basic concepts in
fractions and decimals such as mixed numbers such as integers, fractions and
decimals (C2), know number properties such as the relationship between the two
mixed numbers (S2), comprehend the mathematical representations of real numbers
on the number line (including the meaning of order) (S3), compare the given
numbers with each other and with the numbers on the number line (S9), and place
the numbers in correct order on the real number line (P7). Failure to master any of
these attributes leads to an erroneous answer.
2. Without calculation, select the approximate value of 5.3562.8:
1. 0.15 2. 1.5 3. 15 4. 150
In order to complete this task successfully, students should master basic concepts in
mixed numbers such as integers, fractions and decimals (C2), know number
504 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
properties such as number of digits for the integer part of the mixed number (S2), be
able to make a correct approximation. That is: 5.35 should be rounded to 5 and 2.8
should be rounded to 3. Then students should correctly multiply 5 by 3 (P2) and
select the correct answer from the four options given in a multiple-choice item (S5).
Deficiencies in any of these attributes lead to an incorrect answer.
3. Given the two triangles ABC and FED
Also given that AB = FE; BC = ED
Complete: If RB =RY then DABC%DFED
Successful completion of this task requires students to use basic concepts and
properties related to congruent triangles such as equal corresponding sides and equal
corresponding angles (C4); to comprehend the given relations between the sides and
the angles of the two triangles and their figural representations as displayed (S3), to
build a solution on the basis of the given information (S10), and to apply their
knowledge in geometry to find the correct correspondence between the vertices (P3).
Deficiencies in one or more of these attributes lead to an incorrect answer.
4. Mark the largest of the following fractions:
1. 3/5 2. 5/10 3. 6/15 4. 11/20
In order to answer this question correctly, students should be able to use basic
concepts in and operations in fractions (C2) such as the relation between the
numerator and denominator in determining the value of the fraction, common
denominator and equivalent fractions (result from multiplying the numerator and
denominator of the fraction by the same number). Students should also be able to
apply computational knowledge in arithmetic (multiplication) on the numerators and
denominators of the fractions to equalize the denominators (P2) and to create an
appropriate basis on which they compare and order the fractions according to their
value (S9). Finally, students should be able to select the correct answer from the four
options in the multiple-choice item by comparing the numerators of the resulting
fractions with similar denominators (S5). Deficits in one or more of these attributes
cause erroneous response.
5. If t = 1 what is the value of 2(3 + t)?
In order to perform this task correctly, students should be able to use basic
concepts (such as unknown or variable) and operations (such as substitution) in
Large-Scale Diagnostic Assessment 505
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
elementary algebra (C3). They also should be able to apply computational knowledge
including the distributive property and the correct order of performing arithmetic
operations (P2).
6. Solve the following equation:x� 5
2¼ 6
To reach a correct solution, students should be able to use basic concepts related to
simple equations such as algebraic expression and an unknown or variable. They also
should be able to use operations such as performing the same manipulations on the
two sides to find the value of the unknown variable (C3). Students should also be able
to multiply both sides of the equation by 2 then to add 5 to both resulting sides to find
the value of x (P4). Failure to perform one or both steps leads to wrong answer.
7. One kilogram of tomatoes cost a dollar more than one kilogram of cucumbers. One
kilogram of onion cost half the price of one kilogram cucumbers. Dani bought one kilogram of
tomatoes, one kilogram of cucumbers, and two kilograms of onions and paid 10 dollars.
What is the price of one kilogram of cucumbers? Write the process of your solution.
The solution of this item requires students to build a multi-step solution for a word
problem (S10), to comprehend, extract and organize the relevant information
included in word problem (S11), to use basic concepts and operations in elementary
algebra such as using symbols to represent unknowns (C3), to use basic concepts
such as half the price and to use operations such as addition of a fractional expression
like x/2 to other expressions such as x and x + 1 (C2), translate/formulate expressions
and an equation to solve the problem (If the price of one kilogram of cucumbers is x
dollars then one kilogram of tomatoes cost x + 1 dollars and the price of 1 kilogram of
onion is x/2. The equation is x + (x + 1) + 2�x/2 = 10 more (P1), Multiplying both
sides of the equation by 2 which results in 2x + 2x + 2 + 2x = 20, summing similar
expressions which results in 6x + 2 = 20, subtracting 2 from both sides (preserve the
equality between the two sides) results in 6x = 18, then dividing both sides by 6 results
in x = 3 (P4), and use logical reasoning to check/verify the solution (P5). In this last
step students should be able to evaluate the correctness of their solution by
substituting the resulting prices in the equation x + (x + 1) + 2�x/2 = 10. That is
3 + 4 + 2�3/2 = 10 thus 10 = 10 (indicating a correct answer)
8. On one side of a fair coin there is a picture while on the other side there is a number. Rina
decided to toss the coin four times. In the first three tosses the result was the picture. What is
the probability that the coin show a picture in the fourth tossing?
1. 3/4 2. 1/2 3. 1/3 4. 1/4
This question taps the topic of basic probability and its correct solution requires the
student to work with verbally loaded items, specifically to comprehend the problem
(S11), to use basic concepts of probability such as an event (C5), use basic properties
506 M. Birenbaum et al.
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014
of probabilistic events such as independence and apply relevant rules to solve the
problem – when events are independent the probability to get the picture remains the
same regardless of the result(s) of the previous toss(es)(P4). Deficiencies in one or
more of these attributes lead to failure to perform the task.
9. Given the following function:
f(x) = 77 x
Complete: f(Y) = 3
To complete this item correctly students should be able to use basic concepts and
operations in elementary algebra such as function (a well-behaved relationship),
variable, and substitution (C7). They also should be able to solve the simple algebraic
equation 77x = 3 by subtracting 7 from both sides of the equation which results
in – x =7 4 and then multiply both sides by – 1 which results in x = 4 (P4).
10. What property exists in all sums of any three successive numbers?
This task follows an easier one that requires students to provide four examples of
sums of three successive numbers. In order to define the property that characterizes a
set of numbers (sums of triples of successive numbers), students should be able use
basic concepts such as successive numbers and operations such as sum of three
successive numbers (C1), to use prior knowledge regarding number properties and
relationships such as the difference between pairs of successive numbers is 1 (S2), to
analyze the resulting sums and to conduct a search for common characteristics
(P6), to recognize patterns in a number set such as multiples of 3 (S6), and to use
inductive thinking to generalize from the characteristics of the individual sums to the
set of sums. Failure to demonstrate one or more of these attributes leads to incorrect
response.
Large-Scale Diagnostic Assessment 507
Dow
nloa
ded
by [
Tuf
ts U
nive
rsity
] at
14:
19 0
4 N
ovem
ber
2014