revealing german primary school students’ achievement in ... · humboldt university, unter den...

15
ORIGINAL ARTICLE Revealing German primary school students’ achievement in measurement Jasmin Hannighofer Marja Van den Heuvel-Panhuizen Sebastian Weirich Alexander Robitzsch Accepted: 24 July 2011 / Published online: 28 August 2011 Ó FIZ Karlsruhe 2011 Abstract The focus of this study was to investigate pri- mary school students’ achievement in the domain of measurement. We analyzed a large-scale data set (N = 6,638) from German third and fourth graders (8- to 10-year-olds). These data were collected in 2007 within the framework of the ESMaG (Evaluation of the Standards in Mathematics in Primary School) project carried out by the Institute for Educational Quality Improvement (IQB) at Humboldt University, Berlin, Germany. The data were interpreted using a classification scheme based on a con- ceptual–procedural distinction in measurement compe- tence. The analyses with this classification revealed that grade, gender, and in particular figural reasoning ability are significantly related to overall measurement competence as well as on the sub-competencies of Instrumental knowl- edge and Measurement sense. The paper concludes with a discussion of the implications of the findings of this study for teaching and assessing measurement. Keywords Mathematical competence Measurement Gender Grade Figural reasoning ability 1 Introduction In many countries, assessments are carried out to measure the effects of education on students’ achievement. Exam- ples of such national assessments are the NAEP (National Assessment of Educational Progress) in the USA, the PPON (National Assessment of Educational Achievement) in the Netherlands, the NAPLAN (National Assessment Program—Literacy and Numeracy) in Australia, and the PSLE (Primary School Leaving Examination) in Singa- pore. These assessments of educational output are mostly based on national achievement standards, which—begin- ning with the standards published by the American National Council of Teachers of Mathematics (NCTM 1989)—have been formulated since the late 1980s. In Germany, standards were developed for primary school mathematics in 2004 by the KMK (Standing Con- ference of the Ministers of Education and Cultural Affairs of the States in the Federal Republic of Germany) (KMK 2005). The standards describe what students are expected to have achieved by the end of grade 4, which in Germany is the end of primary school. At that time, the students are about 10 years old. The KMK standards for primary school mathematics distinguish five general competencies (Problem Solving, Communicating, Reasoning, Modeling, and Representing) and five content-related mathematical competencies (Numbers and Operations, Space and Shape, Patters and Structure, Measurement, and Probability). The latter set of competencies relate to the structure of mathematical con- tent as described, for example, in the NCTM (2000) stan- dards and also reflected in the PISA framework for assessing mathematics (OECD 2003). Starting in 2004, the KMK standards were used to evaluate primary school students’ achievement. The J. Hannighofer (&) S. Weirich Institute for Educational Quality Improvement (IQB), Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: [email protected] M. Van den Heuvel-Panhuizen Freudenthal Institute for Science and Mathematics Education, Utrecht University, Utrecht, The Netherlands A. Robitzsch Federal Institute for Education Research, Innovation and Development of the Austrian School System, Salzburg, Austria 123 ZDM Mathematics Education (2011) 43:651–665 DOI 10.1007/s11858-011-0357-y

Upload: others

Post on 30-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

ORIGINAL ARTICLE

Revealing German primary school students’ achievementin measurement

Jasmin Hannighofer • Marja Van den Heuvel-Panhuizen •

Sebastian Weirich • Alexander Robitzsch

Accepted: 24 July 2011 / Published online: 28 August 2011

� FIZ Karlsruhe 2011

Abstract The focus of this study was to investigate pri-

mary school students’ achievement in the domain of

measurement. We analyzed a large-scale data set

(N = 6,638) from German third and fourth graders (8- to

10-year-olds). These data were collected in 2007 within the

framework of the ESMaG (Evaluation of the Standards in

Mathematics in Primary School) project carried out by the

Institute for Educational Quality Improvement (IQB) at

Humboldt University, Berlin, Germany. The data were

interpreted using a classification scheme based on a con-

ceptual–procedural distinction in measurement compe-

tence. The analyses with this classification revealed that

grade, gender, and in particular figural reasoning ability are

significantly related to overall measurement competence as

well as on the sub-competencies of Instrumental knowl-

edge and Measurement sense. The paper concludes with a

discussion of the implications of the findings of this study

for teaching and assessing measurement.

Keywords Mathematical competence � Measurement �Gender � Grade � Figural reasoning ability

1 Introduction

In many countries, assessments are carried out to measure

the effects of education on students’ achievement. Exam-

ples of such national assessments are the NAEP (National

Assessment of Educational Progress) in the USA, the

PPON (National Assessment of Educational Achievement)

in the Netherlands, the NAPLAN (National Assessment

Program—Literacy and Numeracy) in Australia, and the

PSLE (Primary School Leaving Examination) in Singa-

pore. These assessments of educational output are mostly

based on national achievement standards, which—begin-

ning with the standards published by the American

National Council of Teachers of Mathematics (NCTM

1989)—have been formulated since the late 1980s.

In Germany, standards were developed for primary

school mathematics in 2004 by the KMK (Standing Con-

ference of the Ministers of Education and Cultural Affairs

of the States in the Federal Republic of Germany) (KMK

2005). The standards describe what students are expected

to have achieved by the end of grade 4, which in Germany

is the end of primary school. At that time, the students are

about 10 years old.

The KMK standards for primary school mathematics

distinguish five general competencies (Problem Solving,

Communicating, Reasoning, Modeling, and Representing)

and five content-related mathematical competencies

(Numbers and Operations, Space and Shape, Patters and

Structure, Measurement, and Probability). The latter set of

competencies relate to the structure of mathematical con-

tent as described, for example, in the NCTM (2000) stan-

dards and also reflected in the PISA framework for

assessing mathematics (OECD 2003).

Starting in 2004, the KMK standards were used to

evaluate primary school students’ achievement. The

J. Hannighofer (&) � S. Weirich

Institute for Educational Quality Improvement (IQB),

Humboldt University, Unter den Linden 6,

10099 Berlin, Germany

e-mail: [email protected]

M. Van den Heuvel-Panhuizen

Freudenthal Institute for Science and Mathematics Education,

Utrecht University, Utrecht, The Netherlands

A. Robitzsch

Federal Institute for Education Research,

Innovation and Development of the Austrian School System,

Salzburg, Austria

123

ZDM Mathematics Education (2011) 43:651–665

DOI 10.1007/s11858-011-0357-y

Page 2: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

Institute for Educational Quality Improvement (IQB) at the

Humboldt University is responsible for this evaluation and

carries out the assessment. For mathematics in primary

school, this was done in the ESMaG (Evaluation of the

Standards in Mathematics in Primary School) project. In

our study, we explored primary school students’ achieve-

ment in the mathematical domain of measurement.

Measurement competence is generally described as the

ability to assign a numerical value to an attribute of an

object or event NCTM (2000). The mathematical domain

of measurement is considered as the most widely used

application of mathematics in everyday life and is regarded

as a foundation for many sciences (Lehrer 2003; Vasilyeva,

Casey, Dearing and Ganley 2009). Moreover, measurement

is regarded as one of the most challenging areas of math-

ematics in elementary school (Vasilyeva et al. 2009). Being

competent in measurement means that children have the

ability to grasp the physical world around them by

expressing its properties in numbers and reasoning about

them mathematically (van den Heuvel-Panhuizen and Buys

2008).

The KMK measurement standard is subdivided into two

competencies and further divided into a number of sub-

competencies (Table 1).

2 What is already known about German primary

school students’ achievement in measurement?

Knowledge about German primary school students’

achievement in the mathematical domain of measurement

is scarce. There are only two recent studies that give some

information about this. The latest findings come from

TIMSS 2007 (Mullis, Martin and Foy 2008; Bos et al.

2008). Of the 37 countries that participated in TIMSS

2007, the German fourth graders were ranked 12th between

Australia and Denmark below, and the USA and Lithuania

above them. However, compared to these countries, the

variance of performance scores was rather low in Germany

and the range from the best-performing to the lowest-per-

forming students was rather small. But these TIMSS results

offer hardly any specific information about measurement

achievement, because this domain forms one content

domain together with geometry and is named Geometric

shapes and measures.

Additional information about German students’

achievement in measurement is provided by Lobemeier

(2005), who carried out a secondary analysis of the data

collected in 2001 in the IGLU (Bos et al. 2003) and the

IGLU-E study (Lankes et al. 2003), in which 16 mea-

surement items were used from TIMSS 1995 (Mullis et al.

1997). Lobemeier (ibid.) classified these items into four

measurement-related categories named ordering, estimat-

ing, partitioning, and operating. Ordering includes tasks

that require, among other things, comparing measures such

as temperatures and time spans and arranging them in a

systematic order (e.g., ordering minute, hour, day, week,

and month from the shortest to the longest). Estimating

includes, for example, roughly determining the weight or

length of an object without using a measuring tool (e.g.,

estimating whether a pencil is 5-, 10-, 20-, or 30-cm long).

Partitioning areas, volumes, and weights is Lobemeier’s

third category. Here students, for example, had to identify

how a particular weight was composed and then had to use

this knowledge to determine the number of weights that

would balance a scale. The focus of tasks in the operating

category is on carrying out multi-step calculations with

measures.

Lobemeier’s (ibid.) results showed that German fourth

graders performed best on the ordering items. The success

rate for these items was between 95 and 66%. Lower scores

were found for the items on estimating. Here the percent-

age of correct answers ranged from 79 to 55%. For the

three items on partitioning, a similar range was found. A

really large range in success rate was found in the items

within the category operating. The easiest item was solved

Table 1 Measurement competencies as formulated in the KMK (Konferenz der Kultusminister der Lander in der Bundesrepublik Deutschland)

(2005) standards

I Having conceptions of measures a. Knowing standard units that belong to monetary values, lengths, durations, weights, and volumes

b. Comparing, measuring, and estimating measures

c. Knowing objects or events that are important in everyday life and that represent a particular

standard unit

d. Converting measures

e. Knowing and understanding simple fractions in the context of measures from everyday life

II Dealing with measures in context

situations

a. Measuring with appropriate measuring units and instruments

b. Using representatives (of standard units) from everyday life to solve context problems

c. Calculating in context problems with appropriate estimates of measures

d. Solving context problems that require dealing with measures

652 J. Hannighofer et al.

123

Page 3: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

by 78% of the students. In this item, the students had to

identify what date it was 3 weeks after a particular date.

The most difficult item was one in which the circumference

of a rectangle and its width was given and students had to

calculate its length. Only 19% of the students could solve

this item. These results correspond with the TIMSS 2007

findings that this topic is not included in the curriculum

until grade 5 and that only 55% of German students have

been taught this topic in grade 4.

Another area that has been studied is how students’

measurement competence develops during the primary

school years. In agreement with the findings of Winkel-

mann and van den Heuvel-Panhuizen (2009), we assume

that there will be progress in achievement from grade 3 to

4. Other information about students’ mathematical devel-

opment between grades 3 and 4 can be found in TIMSS

1995 (Mullis et al. 1997). The international results from

this study, in which Germany did not participate, showed

that for mathematics, in general, the international average

of the fourth-grade students (529) was approximately 60

points higher than the average of third-grade students

(470). For items with low difficulty, for example, esti-

mating a pencils’ length, the average percentage correct for

all countries was 77% for the fourth graders, and 69% for

third graders. However, such an increase could not be

found for difficult measurement items. For example, stu-

dents in both grades performed very similarly with 21 and

23% correct answers, respectively, on an item involving a

multi-step problem requiring students to apply their

knowledge of the perimeters of rectangles. These results

for measurement deviated from those for the other math-

ematical content domains where the differences between

grade 3 and 4 in the overall achievement in the domains

were often larger and ranged from at least 6 up to 21

percentage points.

A further issue is whether boys and girls differ in their

measurement competence. The grade 4 results from TIMSS

2007 (Mullis et al. 2008; Bos et al. 2008) did not show

significant gender differences in the domain Geometric

shapes and measures, whereas in the domains Number and

Data display German boys did significantly better than the

girls. Lobemeier’s (2005) findings were in agreement with

these TIMSS results. For all of the four categories of

measurement items that she distinguished, she found that

boys and girls scored equally well. Ratzka (2003), who also

used TIMSS items to investigate mathematical achieve-

ment of German fourth graders, also found no significant

gender differences on the scale for measurement. In con-

trast to these three studies, earlier analyses based on the

students’ responses to the items used in the ESMaG project

(Winkelmann, van den Heuvel-Panhuizen and Robitzsch

2008; Winkelmann and van den Heuvel-Panhuizen 2009)

showed that, out of the five mathematical content domains

as defined in the KMK Standards, the strongest differences

between boys and girls were found (d = -.36) in the

measurement items. Moreover, Kaiser and Steisel (2000)

found higher measurement scores for boys than for girls in

grade 8 in TIMSS 1997.

A further point of interest is the relationship between

students’ measurement ability and their figural reasoning

ability. Many studies have shown a strong connection

between spatial ability and students’ mathematics

achievement in general (see, e.g., Sherman 1980; Fennema

1979). This correlation increases with the complexity of

mathematical tasks (Kaufmann 1990). However, it is

unclear whether this connection also applies to the sub-

domain of measurement. Also, to our knowledge no

research exists that explores the relationship between stu-

dents’ measurement achievement and their figural reason-

ing ability. In the case of measurement, this latter

relationship is even more relevant than with spatial ability

in general, because in solving measurement tasks students

often have to reason with figures which are not necessarily

presented in a spatial context.

Knowing more about how the students’ measurement

ability is related to their figural reasoning ability might also

give a further insight into possible differences in mea-

surement achievement between girls and boys.

3 Research questions

The goal of this study was to know more about the German

primary school students’ achievement in the mathematical

domain of measurement. The following research questions

were investigated:

Q1. What can be said about the dimensionality structure

in the measurement competencies based on the

ESMaG data?

Q2. To what degree do fourth-grade students outperform

third-grade students in measurement achievement?

Q3. Does measurement achievement differ by gender?

Q4. Does figural reasoning ability correlate with

students’ achievement in measurement?

Our research questions are exploratory, because we

found hardly any previous research that gave indications

for formulating hypotheses. The only predictions that could

be made based on prior studies focused on gender differ-

ences. However, the difficulty here is that the findings from

these studies contradict each other. For example, Lobe-

meier (2005) and Mullis et al. (2008) did not detect

significant gender differences in the domain of measure-

ment, whereas Kaiser and Steisel (2000) found that eighth-

grade boys outperformed girls in their achievement in

measurement.

Primary school students’ achievement in measurement 653

123

Page 4: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

4 Method

4.1 Sample and data collection

The data for this study were collected in 2007 from a

national representative sample of 6,638 German primary

school students (3,280 third-grade students and 3,358

fourth-grade students). In a multistage sampling procedure,

schools were randomly selected within each of the 16

German states and within each school one grade 4 and one

grade 3 class were randomly selected. Additionally, data

were collected about the figural reasoning ability of the

students by having them take the subscales for figural

analogy and figural classification of the Kognitiver

Fahigkeits-Test (KFT) (Heller and Perleth 2000).

4.2 Item development and classification for evaluating

the KMK standards

One of the requirements of the KMK for the evaluation of

the standards was that teachers should be heavily involved

in the evaluation process (Granzer 2009). This stipulation

meant that the development of items attended to what

teachers thought was important to be assessed when stu-

dents’ achievement was evaluated. Consequently, in the

ESMaG project psychometricians, didacticians and teach-

ers worked together in developing and classifying the

items. Firstly, teachers who specialized in mathematics

education and trained in item development were asked to

compile and design a collection of items for assessing

students’ competence in measurement, and to classify the

items in terms of the two measurement competencies as

formulated in the KMK standards. In a second step, the

items and their classifications were examined by a group of

mathematics didacticians.

As shown in Table 2, more than half of the developed

items were attributed to the competence Having concep-

tions of measures (Competence I), while almost one-third

were attributed to the competence Dealing with measures

in context situations (Competence II). Moreover, about

one-tenth of the items were ascribed to Competence I as

well as to Competence II.

Figures 1, 2, 3, and 4 show examples of the developed

items and the KMK competence categories to which they

were allocated. For test security reasons, we can only show

examples which have been released for publication. Nev-

ertheless, this restricted selection of items made clear that

although the developed items and their classifications had

support from the community of mathematics teachers and

didacticians, they did not produce a clear and focused

domain structure of measurement and the corresponding

competencies.

The item in Fig. 1 refers without doubt to conversion of

measures and is as such assigned to Competence I Having

conceptions of measures. Yet, one may wonder whether such

a technical sub-competence belongs to this competence.

The item in Fig. 2, about working with the map of a zoo,

is evidently about dealing with measurement context situ-

ations and consequently corresponds to Competence II.

However, for the item in Fig. 3, the assignment to Com-

petence II is questionable. This item does not assess stu-

dents’ measurement competence; it only requires reading

the text to identify the time sequence that fits best to the

described story. Similarly, questions can be raised with

respect to the item in Fig. 4. Although it is apparent that

this item compares measures and as such is a sub-compe-

tence of Competence I, it is not so obvious that this item

can be considered as an operationalization of Compe-

tence II, i.e., measuring with appropriate measurement

units and instruments.

4.3 Revisiting the KMK standards for measurement

A detailed consideration of the items and their classifica-

tions used in the ESMaG project revealed that the

Table 2 Measurement competencies to which the measurement

items were attributed

Type of measurement competence based

on the KMK standards

Number

of items

I Having conceptions of measures 58

II Dealing with measures in context situations 27

I?II Having conceptions of measures ? dealing with

measures in context situations

12

Total 97

Convert the time measurements as is shownin the example and fill in the gaps.

Example: 87 min = 1 h 27 min

a) 144 min = _____ h _____ min

b) __________ min = 3 h 54 min

c) __________ min = 6 h 40 min

KMK category:I. Havingconceptionsofmeasures

d. Converting measures

© Cornelsen Verlag 2008 (Bildungsstandards:Kompetenzen überprüfen, Mathematik Grundschule, Klasse 3/4)

Fig. 1 Measurement item concerning conversion of time

measurements

654 J. Hannighofer et al.

123

Page 5: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

measurement sub-competencies as formulated in the KMK

standards are partly ambiguous. For example, the sub-

competence Knowing standard units that belong to mone-

tary values, lengths, durations, weights and volumes

(Competence Ia) overlaps with the sub-competence

Knowing object or events that are important in everyday

life and that represent a particular standard unit (Com-

petence Ic). Although Competence Ia means to know and

understand terms like ‘‘grams’’ and ‘‘kilograms’’ and

Competence Ic means for example to know things that are

about 10 kg, competencies are not easy to distinguish

because tasks often require both. Students, for example,

often have to compare different things and therefore have

to know how big, tall, heavy, etc., things are. Moreover,

this latter sub-competence and the sub-competence

Knowing and understanding simple fractions in the context

of measures from everyday life (Competence Ie) do not

differ from the sub-competencies in Competence II that

encompass all kinds of context situations in which students

have to deal with measures.

Having a framework of standards with partially over-

lapping and ambiguous definitions does not clearly distin-

guish the competencies and, in turn, resulted in items that

may not be precise enough to assess distinctive aspects of

the measurement competence.

4.4 Alternative classification based

on conceptual–procedural distinction

As Resnick and Ford (1981) pointed out, the distinction

between computational skills and conceptual understanding

is one of the oldest concerns in mathematics education.

Many researchers have addressed this splitting up into two

kinds of understanding, although they do not always use

the same terms for it and also differ in their interpretation.

Already in the 1970s, Skemp (1976) made mathematics

teachers aware that students should develop relational

understanding, meaning that students should understand

both what to do and why and that this understanding

deviates from what Skemp referred to as instrumental

understanding, which means learning rules without know-

ing why. Although using different terms, a similar dis-

tinction is made by Hiebert (1986) when discerning

conceptual and procedural knowledge. Comparably with

Skemp’s relational understanding, Hiebert’s conceptual

knowledge includes relationships between mathematical

objects, and Hiebert’s procedural knowledge, which refers

to knowledge of standard learned procedures, corresponds

with Skemp’s instrumental understanding. However,

Rittle-Johnson and Wagner Alibali (1999), who also made

the distinction between conceptual and procedure knowl-

edge, emphasize that both types of knowledge lie on a

continuum and it is not always possible to separate them.

In fact, the distinction between conceptual and proce-

dural knowledge is applicable in all domains of mathe-

matics, but it fits particularly well to the domain of

measurement in which instrumental as well as conceptual

knowledge plays unique roles. The KMK framework

reflects to some degree a division into an understanding of

basic facts and procedures related to measurement, on the

one hand, and the understanding of measurement concepts

and how they are related and the ability to apply this

knowledge in everyday contexts, on the other hand.

However, these two perspectives (the procedural or

instrumental understanding versus the conceptual or rela-

tional understanding) are not clearly distinguished in all the

sub-competencies. Therefore, we developed a framework

that could better distinguish the measurement items in such

a way that there is a division between items that refer to

using Instrumental knowledge (IK) and items that imply

Measurement sense (MS). The IK competence includes

Fig. 2 Measurement item concerning measurements in a map of a

zoo

Primary school students’ achievement in measurement 655

123

Page 6: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

having available straightforward and isolated measurement

knowledge and procedures, while the MS competence

involves having available knowledge about measures and

units of measurement in everyday life and being able to

apply all kinds of measurement knowledge in context sit-

uations. In Table 3, we have described sub-categories of

these competencies by referring to the types of items that

belong to IK and MS.

We classified all the items according to the new

framework. Contrary to the KMK classification, the alter-

native classification did not result in items that were

attributed to both IK and MS.

A closer examination of all the 97 items revealed that

there were quite a number of them that, although a part of

the collection of measurement items, should—according to

our opinion—not have been attributed to measurement. In

total, we found 28 of these items, which we labeled as ‘‘No

measurement’’ items (see Table 3). They range from

comparison problems in which the measurement context is

not relevant to an item that merely requires reading

(Fig. 3).

A less detailed classification was found when we clus-

tered the items belonging to the different sub-categories

within the IK and MS competencies (see Table 4). The

collection of items that belong to the IK competence can be

grouped into items that are about times and items that

concern other measurement attributes (e.g., length, weight).

Similarly, the items that refer to the MS competence can be

divided into items that assess whether students have a

particular knowledge and items that are about problem

solving.

4.5 Statistical analyses

4.5.1 Estimation of students’ achievement

To estimate students’ achievement, the items were scaled

within the framework of Item Response Theory (IRT). As

the data were collected in a multi-matrix sampling design,

every student completed only a subsample of all available

items, resulting in many items with randomly missing

values (Rubin 1987). Neither the KMK competence cate-

gories nor the alternative competence categories were

evenly distributed over the booklets. This is caused by the

design of the ESMaG study, in which only main categories

(e.g., measurement) were balanced over the booklets.

Fig. 3 Measurement item

concerning identifying correct

time sequence

656 J. Hannighofer et al.

123

Page 7: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

For the estimation of population parameters (means,

standard deviations) and for students’ achievement, we

used the plausible value technique (von Davier 2009;

Mislevy et al. 1992). Plausible values (PVs) are drawn

from the distribution of students’ latent abilities. Latent

regression models are specified for drawing PVs to reflect

all statistical relationships of ability variables with covar-

iates of interest that will be used in later carried out sta-

tistical analyses. The variation between different PVs

reflects the uncertainty due to missing data (missing by

design) and measurement error modeled within the IRT.

Using this technique, data with a considerable amount of

missing data and measurement error can be analyzed as if

there were no missing values and no measurement error.

The analyses are repeatedly conducted for each set of PVs,

and the results are pooled and tested for significance (Little

and Rubin 2002).

4.5.2 Analyses of dimensionality

To explore the structure of the measurement competence,

we conducted confirmatory factor analyses (CFA) for the

KMK classification as well as for the alternative classifi-

cation. This CFA allowed us to verify the latent factor

structure of a set of observed test responses and to test the

homogeneity of the domain to determine whether the dif-

ferentiation in the two competencies was reasonable from

an empirical point of view.

For both classifications, we specified a two-factor

model. The KMK classification includes items with within-

item dimensionality. Such items belong to both factors.

The alternative classification allows only between-item

dimensionality, i.e., each item belongs to only one factor.

If the proposed two-dimensionality fits better to the

empirical data than a one-dimensional structure, the latent

correlation between the two dimensions is expected to

differ significantly from one. Therefore, a model with a

fixed correlation of one should fit the data worse than a

model where the correlation is estimated freely. To test

this, the chi-square statistic provided by the Wald test was

used (Bollen 1989; Muthen and Muthen 1998–2007). In an

additional model constraint, the correlation was forced to

equal one, obtaining one additional degree of freedom. The

Wald test provides a chi-square statistic quantifying the

loss of model fit by this constraint. If the Chi-square

statistic is significant at an alpha-level of .05, the loss of

model fit is substantial.

We tested the equality of latent factor correlations in

subgroups, i.e., for boys and girls and for third graders and

fourth graders. The analysis was done for the four dis-

junctive groups of third-grade girls, third-grade boys,

fourth-grade girls, and fourth-grade boys. The model was a

two-dimensional CFA, a multi-group analysis to differen-

tiate between the groups. Within each group, the correla-

tion between the two dimensions was estimated and tested

for being equal in all subgroups.

4.5.3 Effect of grade, gender, and figural reasoning ability

on measurement achievement

We investigated grade and gender differences with a two-

way analysis of variance (ANOVA). The dependent vari-

able is the PV score on the measurement items. This means

that the results for all analysis concerning PVs were con-

ducted five times (for each set of PVs) and the results

pooled according to Rubin’s rule (Little and Rubin 2002).

As stated earlier, all the factors used in each ANOVA

model also occurred in the latent regression model of the

PV imputation.

The same analysis was carried out separately for the

students with high and low figural reasoning ability scores.

The high ability group consisted of all students with a KFT

value of at least one standard deviation above the mean and

the low ability group consisted of all students with a KFT

value at least one standard deviation below the mean.

KMK categories:I. Having conceptions of measures

b. Comparing, measuring and estimating measures

II. Dealing with measures in context situationsa. Measuring with appropriate measuring units and

instruments

Tim und Jana are weighing four bags. Which bag is the lightest one?

Bag A

Bag B

Bag C

Bag D

© Cornelsen Verlag 2008 (Bildungsstandards:Kompetenzen überprüfen, Mathematik Grundschule, Klasse 3/4)

Fig. 4 Measurement item concerning comparing measures

Primary school students’ achievement in measurement 657

123

Page 8: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

To investigate whether gender differences in measure-

ment achievement were at least partially mediated by

gender differences in figural reasoning ability, we carried

out a mediation analysis. To test the indirect effect of the

figural reasoning ability, we used asymmetric confidence

intervals by applying a bias-corrected bootstrap method

(MacKinnon 2008) instead of the Sobel test, because

conventional tests of significance might lead to biased

results due to non-normality (MacKinnon et al. 2004;

MacKinnon 2008). Indirect effect parameters are com-

pound from the product of two regression coefficients and

therefore they are often not normally distributed.

5 Results

5.1 Measurement achievement

The easiest items were, for example, those in which stu-

dents had to say which was more: 6,000 m or 5 km? Other

easy items were those in which students had to decide

which unit (km, s, min, kg, t) was the right one, when they

had to complete the sentence: a duck weighs about 4___.

An example of an item of medium difficulty is shown in

Fig. 2. To answer this item, students had to use a map that

contained measures for determining a particular distance.

The most difficult items were those in which students had

to solve a problem like 2 � h = ___hours and ___minutes.

5.2 Inspection of the two classifications

Before answering the research questions about students’

achievement in measurement, we first investigated the

dimensionality of the two classifications. This was done to

assess which classification provided a better differentiation

within the structure of the measurement competence and

thus gave a better insight into achievement in measurement.

Our analyses showed that the correlation of the two

KMK competencies was r = .78. Comparing the underlying

Table 3 Alternative classification of measurement items

Type of measurement

competence

Sub-category Number of

items

Examples

Subtotal Total

Instrumental knowledge

(IK)

Conversion of measures 9 43

Comparison of measures 9 Fig. 4

Conversion of times 17 Fig. 1

Comparison of times 1

Reading clocks or timetables 7

Measurement sense

(MS)

Knowledge of daily life sizes 10 26

Knowing which unit of measurement belongs to an attribute 1

Context problems about calculations with multiple attributes 5

Context problems about additive calculation with one attribute and multiple units of

measurement

4 Fig. 2

Context problems about multiplicative calculation with one attribute and multiple

units of measurement

2

Context problems with one attribute and one unit of measurement 4

No measurement Comparison problems in which the measurement context is not relevant 3 28

Context problems with years 1

Context problems about calculation with money 20

Context problems about fractions 1

Bare number problems 2

Just requires reading 1 Fig. 3

97

Table 4 Alternative classification of measurement items with clus-

tered sub-categories

Type of measurement

competence

Clustered

sub-category

Number of items

Subtotal Total

Instrumental knowledge

(IK)

Time items 25 43

Items about other attributes 18

Measurement sense

(MS)

Knowledge 11 26

Problem solving 15

69

658 J. Hannighofer et al.

123

Page 9: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

two-dimensionality to a one-dimensional structure revealed

that Competence I Having conceptions of measures and

Competence II Dealing with measures in context situation

were distinguishable constructs (v2 = 4.74; df = 1;

p \ .05). The standard deviation of students’ achievement

was 1.29 for Competence I and .96 for Competence II.

The correlation of the competencies Instrumental

knowledge (IK) and Measurement sense (MS), which are

based on the alternative classification, was r = .70. Com-

paring the underlying two-dimensionality to a one-dimen-

sional structure indicated that IK and MS were

distinguishable constructs (v2 = 4.92; df = 1; p \ .05).

Standard deviation of students’ achievement was 1.67 for

IK and 1.18 for MS.

As both latent correlation coefficients were estimated in a

structural equation model, standard errors of the coefficients

were available. Yet, the two correlation coefficients cannot

be tested for equality, because they stem from different

subsets of items. However, the 95% confidence inter-

vals (CI) show only a minimal overlap (CIKMK classification =

[.73, .82]; CIalternative classification = [.66, .74]). Therefore, we

can assume that the correlation coefficients for both classi-

fications differ. However, when comparing the correlation

coefficients obtained from the subset of 69 items (see

Table 4), once classified according to the two KMK com-

petencies and according to the alternative competencies, we

did not find a significant difference between these correla-

tion coefficients (KMK classification r = .71; alternative

classification r = .70, t = .41, p = .69). We found that in

terms of log-likelihood value (logL) and information criteria

(AIC, BIC), the alternative classification (logL = -24,068,

AIC = 48,413, BIC = 49,315) fitted the data better than the

KMK classification (logL = -24,129, AIC = 48,554,

BIC = 49,514), as indicated by the larger log-likelihood

value and the smaller values of information criteria AIC and

BIC.

To ensure that the two-dimensional structure was not due

to characteristics of the items, we constructed a two-

dimensional structure that was based on a random allocation

of items and tested that structure for dimensionality. Two

dimensions consisting of randomly allocated items are

expected to correlate to one. Hence, the two-dimensional

model, as well as a one-dimensional model, is expected to

fit. The correlation of both arbitrary main domains was

r = .99. The model does not fit better to the data than the

one-dimensional model (v2 = .01, p = .97). In summary,

we have three possible models, each consisting of two

dimensions: the model based on the KMK classification, the

model based on the alternative classification, and the model

based on an arbitrary classification. On comparing the fit of

each model in relation to the one-dimensional model, we

found that the arbitrary model fitted worst to the data (i.e., it

did not fit better than the one-dimensional model). The

model based on the KMK classification fits significantly

better than the one-dimensional model (i.e., it separates the

items in two distinguishable constructs), and the model

based on alternative classification also fits better than the

one-dimensional model. More precisely, the model based

on alternative classification separates the items into two,

more distinguishable constructs, as the correlation between

both dimensions in this model (r = .70) is less than in the

model based on the KMK classification (r = .78).

The test of the invariance of dimensionality shows that

the dimensionality in the alternative classification also

holds in subgroups. Correlation coefficients for the four

groups were .69 for third-grade boys, .64 for fourth-grade

boys, .68 for third-grade girls, and .65 for fourth-grade

girls. The correlation coefficients were tested for equality.

No significant differences in the coefficients were found

(v2 = .15, df = 1, p = .70).

Because the alternative classification results in a clearer

description of the measurement competence from a

domain-specific didactical perspective and is a more dis-

tinctive classification from a psychometric perspective, the

reported findings in the remainder of the article refer only

to the alternative classification results.

5.3 Students’ competence structure in measurement

In Fig. 5, the average mean scores are shown for the total

items belonging to the IK competence (mean p value .45)

and for the MS competence (mean p value .45). On aver-

age, the students scored on both competencies equally well.

However, on comparing the average mean scores of the

items belonging to the sub-categories as described in

Table 4, we found remarkable differences between the

average mean scores. For the sub-categories within IK as

well as for those within MS, we hit upon large differences

in difficulty level. With respect to IK, the mean p value for

the time items was .36 and for the items about other

attributes .57. A similar difference was found for the MS

mean p-value

MeasurementSense

Instrumentalknowledge

total IK items

items about other attributes

time items

total MS items

knowledge items

problem solving items

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 5 Students’ competence structure in measurement based on an

alternative classification

Primary school students’ achievement in measurement 659

123

Page 10: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

items. The mean p value for the knowledge items was .60

and for the problem solving items .34.

5.4 Measurement competence and other student

characteristics

In Table 5, mean differences of the estimated students’

achievement (PVs) are shown for the total sample of stu-

dents as well as for subgroups specified by grade and

gender for each item subset, i.e., overall measurement

competence, MS, and IK.

As the subgroup means are arbitrary in the IRT frame-

work, we report mean differences relative to the standard

deviation of the respective item subsets. The reference

group was the fourth grade for each item subset.

The standardized mean difference in the overall mea-

surement competence between grades 3 and 4 was .64. This

indicates that third-grade students’ mean achievement was

.64 standard deviations lower than fourth-grade students’

mean achievement. This difference was significant with

F(1, 2,098) = 459.5, p \ .001, and gp2 = .09. The stan-

dardized mean difference for MS between grades 3 and 4

was .71. The difference was significant with F(1,

26.5) = 348.2 and p \ .001, resulting in a partial eta

square of gp2 = .11. The standardized mean difference for

IK between grades 3 and 4 was .61. The difference was

significant with F(1, 42.6) = 279.4, p \ .001, and

gp2 = .08.

With respect to gender, we found that in grade 3 as well

as in grade 4, boys significantly outperformed girls in

overall measurement competence. For grade 3, the mean

difference was .43 and for grade 4 .39. The difference

between the two grades was significant with F(1,

32.1) = 127.5, p \ .001, and gp2 = .04.

For IK and MS, similar results were found as for the

overall measurement competence. For MS, we found a

mean difference of .37 in grade 3 and of .36 in grade 4; the

difference between the grades was significant with F(1,

74.3) = 119.7, p \ .001, and gp2 = .03. The interaction

between grade and gender was not significant with F(1,

205.8) = .17, p = .68, and gp2 = .00. With respect to IK,

we found that in grade 3 the mean difference between boys

and girls was .44, and in grade 4 the mean difference was

.40. The difference between the two grades was significant

with F(1, 93.5) = 154.4; p \ .001, and gp2 = .04. The

interaction between grade and gender was not significant

with F(1, 171.1) = .5, p = .47, and gp2 = .00.

Table 6 lists the regression coefficients of grade, gender,

and figural reasoning ability, and their two-way interac-

tions for the total sample. We found that the interaction

between grade and gender was not significant for the

overall measurement competence, as well as for the com-

petencies IK and MS. Of the three predictors, grade, gen-

der, and figural reasoning ability, the latter turned out to

have the largest effect on the overall measurement

achievement as well as on the achievement in IK and MS.

Table 5 Mean differences of estimated students’ achievement in

measurement for grade and gender (positive values attributed to grade

4 and to boys, respectively)

Grade Gender Total

sample

Overall measurement competence 3 .43 .64

4 .39

Measurement sense (MS) 3 .37 .71

4 .36

Instrumental knowledge (IK) 3 .44 .61

4 .40

Mean differences were standardized. Reference for standardization is

the standard deviation of boys and girls of fourth grade corresponding

to overall measurement competence, MS, or IK, respectively

Table 6 Regression coefficients for overall measurement competence, IK, and MS

Overall measurement competence

(69 items)

Instrumental knowledge

(43 items)

Measurement sense

(26 items)

B SE B SE B SE

Grade .67*** .05 .78*** .08 .67*** .06

Gender -.62*** .05 -.76*** .07 -.47*** .04

Figural reasoning ability .36*** .02 .42*** .02 .29*** .02

Grade 9 gender .01 .08 .02 .11 -.04 .07

Grade 9 figural reasoning ability -.04 .03 -.07** .03 .00 .03

Gender 9 figural reasoning ability .02 .02 .04 .03 .04 .03

Overall measurement competence: N = 4,850; IK: N = 3,031: MS: N = 3,031

Overall measurement competence: R2 = .35; IK: R2 = .31; MS: R2 = .37

B regression coefficient, SE standard error

* p \ .05; ** p \ .01; *** p \ .001

660 J. Hannighofer et al.

123

Page 11: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

The significant negative interaction of figural reasoning

ability and gender for IK indicates that the regression effect

of IK on figural reasoning ability is higher in grade 3 than

in grade 4.

Table 7 gives an overview of the regression coefficients

of the three predictors, grade, gender, and figural reasoning

ability, and their two-way interactions for students with

high figural reasoning ability.

Similar to the results for the whole sample in the group

of students with a high figural reasoning ability, a signifi-

cant grade effect was found for IK and MS. However, this

was not the case for the overall measurement competence.

Furthermore, we found no gender effects for students with

a high figural reasoning ability: in the overall measurement

competence or in MS and IK. A significant effect of the

figural reasoning ability was found only for IK.

In Table 8, the regression coefficients of the three pre-

dictors and their two-way interactions for students with a

low figural reasoning ability are shown. In contrast to the

results for the total sample, no significant grade effects

were found for the overall measurement competence as

well as for IK and MS for students with a low figural

reasoning ability. Furthermore, for these students we only

found a gender effect for the overall measurement com-

petence and for IK.

Table 9 includes the results of the three mediation

analyses. In the first analysis, the dependent variable was

the students’ achievement in the overall measurement

domain. In the second analysis, it was the students’

achievement in IK. Finally, in the third analysis, the

dependent variable was students’ achievement in MS. The

independent variable for each analysis was gender, and the

mediation variable for each analysis was students’

achievement in figural reasoning ability. The values in

Table 9 indicate that the indirect effect of gender on

measurement is significant, but small. The question is now

Table 7 Regression coefficients for overall measurement competence, MS, and IK for students with high figural reasoning ability

Overall measurement

competence (69 items)

Instrumental knowledge

(43 items)

Measurement

sense (26 items)

B SE B SE B SE

Grade .83 .43 1.34* .59 .82* .34

Gender -.65 .42 -.77 .57 -.38 .43

Figural reasoning ability .34 .18 .47* .18 .24 .15

Grade 9 gender -.10 .19 .04 .28 -.16 .17

Grade 9 figural reasoning ability -.08 .15 -.25 .20 -.01 .12

Gender 9 figural reasoning ability .05 .15 .04 .20 .02 .16

The group with a high figural reasoning ability consists of N = 736 students (grade 3: N = 327; 49% girls; grade 4: N = 409; 56% girls)

Overall measurement competence: R2 = .13; IK: R2 = .18; MS: R2 = .18

B regression coefficient, SE standard error

* p \ .05; ** p \ .01; *** p \ .001

Table 8 Regression coefficients for overall measurement competence, MS, and IK for students with low figural reasoning ability

Overall measurement

competence (69 items)

Instrumental knowledge

(43 items)

Measurement

sense (26 items)

B SE B SE B SE

Grade .20 .39 .40 .54 .46 .36

Gender -.87* .43 -1.22* .48 -.59 .40

Figural reasoning ability .35** .10 .40** .13 .23* .10

Grade 9 gender -.06 .21 -.02 .26 -.09 .22

Grade 9 figural reasoning ability -.21 .15 -.20 .21 -.10 .12

Gender 9 figural reasoning ability -.07 .14 -.13 .18 -.01 .16

The group with a low figural reasoning ability consists of N = 641 students (grade 3: N = 396; 57% girls; grade 4: N = 245; 57% girls)

Overall measurement competence: R2 = .20; IK: R2 = .19; MS: R2 = .22

B regression coefficient, SE standard error

* p \ .05; ** p \ .01; *** p \ .001

Primary school students’ achievement in measurement 661

123

Page 12: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

what the role of the figural reasoning ability is. If figural

reasoning ability would be a mediator, we could expect that

there will be a smaller absolute value of the direct effect

compared to the total effect. However, our results indicate

the opposite. Therefore, we may conclude that our medi-

ation analysis reveals a suppressor effect of the variable

figural reasoning ability, although it is only a very weak

effect. Summarizing these results, we found the following

relations: (1) being a boy increases achievement in mea-

surement; (2) being a girl increases achievement in figural

reasoning ability; (3) having a high figural reasoning ability

increases measurement achievements, which is indepen-

dent of gender. On the whole, this means that the effect of

gender on measurement achievement is underestimated if

the fact that figural reasoning ability suppresses this effect

is not taken into account.

6 Discussion

6.1 Summary of our findings

The analysis of a large set of data collected in 2007 within

the framework of the ESMaG project produced interesting

insight into the measurement competencies of German

students in grades 3 and 4 and how these competencies

were assessed.

An inspection of the KMK standards for measurement

showed that the distinction into the competencies Having

conceptions of measures and Dealing with measures in

context situations that is reflected in these standards was

supported from an empirical point of view. However, the

latent correlation between the two dimensions was rela-

tively high compared to the correlation an alternative

classification including the measurement competencies

Instrumental knowledge (IK) and Measurement sense

(MS). Because of these findings, we used this alternative

classification for answering our research questions.

Although we have to note here again that our item pool

was limited, our results showed that, on average, the stu-

dents solved an equal number of items on both compe-

tencies correctly. However, we found in both competencies

remarkable differences between the average mean scores of

particular item categories. Within the IK items, those

related to time were more difficult than the IK items about

other attributes. Within the MS items, the problem-solving

items were more difficult than the knowledge items.

The analyses carried out to investigate the role of grade,

gender, and figural reasoning ability showed that all these

predictors had significant effects on the overall measure-

ment competence, as well as on the competencies IK and

MS. Figural reasoning ability was found to have the largest

effect. Our study also confirmed gender differences. Male

students outperformed female students both in the overall

measurement competence and in the IK and MS compe-

tencies. These results agree with those reported by Win-

kelmann et al. (2008) and Winkelmann and van den

Heuvel-Panhuizen (2009). However, related to figural

reasoning ability, our results showed that girls outper-

formed boys.

When testing the relationship between gender and

measurement mediated by figural reasoning ability, we

found a small but significant indirect effect of gender on

measurement achievement. Moreover, the results indicated

a suppressor effect of figural reasoning ability, i.e., gender

differences in measurement are underestimated when

ignoring this mediation.

However, all main effects became less substantial or

even not significant within the subgroup of students with a

high figural reasoning ability and the subgroup with a low

figural reasoning ability. Because in this latter group, no

gain in achievement was found between grades 3 and 4, it

seems that for these students 1 year of extra instruction did

not have an effect on their measurement achievement.

Another finding was that within the group of students

with a high figural reasoning ability, we did not observe an

Table 9 Results of mediation analysis

Dependent variable: overall measurement

competence

Dependent variable: Instrumental

knowledge

Dependent variable:

Measurement sense

b SE b SE b SE

Total effect (b3? b1�b2) -.13*** .01 -.15*** .02 -.11*** .02

Indirect effect (b1�b2) .03*** .01 .03*** .01 .03*** .01

Direct effect (b3) -.16*** .01 -.18*** .02 -.14*** .02

Noverall = 4,850; NIK = 3,031; NMS = 3,031

All standard errors are estimated using bias-corrected bootstrap methods with 10,000 samples

b1 standardized regression coefficient of figural reasoning ability on gender, b2 standardized regression coefficient of the measurement ability on

of figural reasoning ability, b3 standardized regression coefficient of the measurement ability on gender

* p \ .05; ** p \ .01; *** p \ .001

662 J. Hannighofer et al.

123

Page 13: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

effect of gender on the measurement achievement, whereas

within the group with a low figural reasoning ability we

obtained a significant gender effect on the overall mea-

surement competence and on IK, with boys outperforming

girls.

6.2 Implications for instruction

The main message from our study is that the German pri-

mary school students’ achievement in the domain of

measurement shows on average a rather balanced compe-

tence structure consisting of IK as well of MS. Having

available both types of knowledge of measurement is a

good thing, but it is not always achieved in education.

Pesek and Kirshner (2000) found that in classroom practice

there is often no time to deal with relational learning

involving explaining, reasoning, reflecting, connecting, and

communicating. To prepare students for the standardized

tests, teachers often have a preference for teaching instru-

mental knowledge: that is, students must learn skills first

and foremost. Obviously, teachers in Germany do not only

pay attention to plain measurement skills, but also include

activities in their teaching that support the development of

understanding measurement.

When looking closer at the two competencies, it was

revealed that, referring to the IK competence, students

have more difficulties with tasks that deal with time mea-

sures, for example, converting time measures such as

‘‘1 min 20 s = ___sec’’, than tasks that deal with other

attributes. Hence, one implication from this analysis is that

mathematics teachers should focus especially on explana-

tion of how to solve time tasks. From the MS results, it

could be concluded that teachers should offer students

more opportunities to learn measurement-related problem

solving.

Other implications for instruction are connected to the

relation of measurement competencies with gender and

figural reasoning ability. Teachers should be aware of the

fact that girls, on the one hand, on average have higher

scores in figural reasoning ability, but, on the other hand,

have more difficulties with measurement tasks. A possible

reason could be that girls have fewer difficulties in working

with figures when they have to reason about figural pat-

terns, whereas boys are better at solving practical mea-

surement tasks related to figures. Therefore, we suggest

that teachers seek ways to increase classroom activities,

which provide practice for both their measurement abilities

and abilities that appeal to figural reasoning.

6.3 Implications for assessing educational output

The quality of the above conclusions mainly depends on

the quality of the assessment. Assessment, in turn, is

determined by the quality of the standards that indicate

what competencies are the goals of education and which

are used to develop the items for assessing the students’

measurement achievement. Our study has shown how

essential it is to have a clear and focused structure of the

mathematical domain when evaluating students’ achieve-

ment. Using the initial data of the ESMaG project about

measurement achievement, we found a rather high corre-

lation between the two measurement competencies. Due to

this correlation, we assumed an overlap between them.

Support for this assumption can be seen in the description

of the sub-competencies within the two KMK competen-

cies for measurement. These descriptions are rather

ambiguous. Some of them are described by referring to the

same content of measurement and therefore will not result

in two distinct sub-competencies on which education can

focus. This ambiguity makes it rather difficult not only to

get a clear picture of the educational output of measure-

ment instruction, but also to give adequate feedback to

teachers. We expect that a classification that focuses on

conceptual and procedural measurement knowledge as the

two main distinctive but, of course, also related compe-

tencies would be easier to handle for test designers and will

inform teachers better.

6.4 Limitations of the study and suggestions

for further research

While summarizing our results and stating implications for

teaching and assessing measurement as a mathematical

domain, we are quite aware of the limitations of our study.

An important point that should be kept in mind is that we

did a cross-sectional study and not a longitudinal one. This

means that in our study, progress over grades and, conse-

quently, influence of teaching were not established by

following the students. Therefore, prudence is required

when using the results of this study. To gain a better

understanding of how the measurement develops over time,

research is necessary with a cohort study design.

A further weak point resulted from the rather ambiguous

description of the measurement competencies in the KMK

standards, which were taken as the basis for the item

development. Although we think that the conceptual–pro-

cedural distinction that we used to develop an alternative

classification gives better access to students’ achievement

in measurement and is more informative for educational

decision making, the analyses we did with this new clas-

sification were still based on items that were developed for

the KMK classification. In other words, the items used

were not fine-tuned to the alternative classification.

Therefore, we see our analyses as a first exploration of

using this new distinction. Further research should start

with a re-definition of the standards for the domain

Primary school students’ achievement in measurement 663

123

Page 14: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

measurement. Then, a new item development process

should be initiated based on these newly formulated stan-

dards, which make more explicit the different knowledge

types students should attain to acquire measurement com-

petencies. Any assessment should start with a clear view of

what should be assessed. We hope our study contributes to

this clearness.

References

Bollen, K. A. (1989). Structural equations with latent variables. New

York: Wiley.

Bos, W., Bonsen, M., Baumert, J., Prenzel, M., Selter, C., & Walther,

G. (Eds.) (2008). TIMSS 2007. Mathematische und naturwis-senschaftliche Kompetenzen von Grundschulkindern in Deutsch-land im internationalen Vergleich. [TIMSS 2007. Mathematical

and scientific competencies of primary school students in

Germany in an international context]. Munster: Waxmann.

Bos, W., Lankes, E.-M., Prenzel, M., Schwippert, K., Valtin, R., &

Walther, G. (2003). Erste Ergebnisse aus IGLU. Schulerleistun-gen am Ende der vierten Jahrgangsstufe im internationalenVergleich. [Fourth-Grade Students in an International Context:

First results from an International Reading Literacy Study

(PIRLS)]. Munster: Waxmann.

Fennema, E. (1979). Women and girls in mathematics—equity in

mathematics education. Educational Studies in Mathematics, 10,

389–401.

Granzer, D. (2009). Von Bildungsstandards zu ihrer Uberprufung:

Grundlagen der Item- und Testentwicklung. [From standards to

evaluation: Examination of educational standards: Background

of item and test development.] In D. Granzer, O. Koller, A.

Bremerich-Vos, M. van den Heuvel-Panhuizen, K. Reiss, & G.

Walther (Eds.), Bildungsstandards Deutsch und Mathematik (pp.

21–30). Weinheim/Basel: Beltz Verlag.

Heller, K. A., & Perleth, Ch. (2000). Kognitiver Fahigkeitstest fur 4.-12. Klassen, Revision (KFT 4-12? R). [Cognitive ability test for

grade 4-12, revision (KFT 4-12? R).] Gottingen: Hogrefe.

Hiebert, J. (1986). Conceptual and procedural knowledge: The caseof mathematics. Hillsdale: Lawrence Erlbaum Associates.

Kaiser, G., & Steisel, T. (2000). Results of an analysis of the TIMS

study from a gender perspective. Zentralblatt fur Didaktik derMathematik, 32(1), 18–24.

Kaufmann, G. (1990). Imagery effects on problem solving. In P.

J. Hampson, D. E. Marks, & J. T. E. Richardson (Eds.), Imagery:Current developments (pp. 169–197). New York: Routledge.

KMK (Konferenz der Kultusminister der Lander in der Bundesre-

publik Deutschland). (2005). Bildungsstandards im Fach Math-ematik fur den Primarbereich (Jahrgangsstufe 4) [Educational

standards in mathematics for primary school (fourth grade).]

Munchen: Luchterhand/Wolters Kluwer Deutschland.

Lankes, E.-M., Bos, W., Mohr, I., Plaßmeier, N., Schwippert, K.,

Sibberns, H., & Voss, A. (2003). Anlage und Durchfuhrung der

Internationalen Grundschul-Lese-Untersuchung (IGLU) und

ihrer Erweiterung um Mathematik und Naturwissenschaften

(IGLU-E). [Design and administering of the international

primary school reading research (PILRS) and its expansion for

mathematics and science.] In W. Bos, E.-M. Lankes, M. Prenzel,

K. Schwippert, R. Valtin, & G. Walther (Eds.), Erste Ergebnisseaus IGLU. Schulerleistungen am Ende der vierten Jah-rgangsstufe im internationalen Vergleich (pp. 7–28). Munster:

Waxmann.

Lehrer, R. (2003). Developing understanding of measurement. In J.

Kilpatrick, W. G. Martin, & D. Schifter (Eds.), A researchcompanion to principles and standards for school mathematics.

Reston: National Council of Teachers of Mathematics.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis withmissing data. New York: Wiley.

Lobemeier, K. (2005). Welche Leistungen erbringen Viertklassler beiAufgaben zum Thema Großen? Untersuchungen zur mathemat-isch-naturwissenschaftlichen Kompetenz im Grundschulalter imRahmen von IGLU. [How do fourth graders perform in tasks

dealing with attributes? Research of mathematical scientific

competence in primary school in the framework of IGLU.] Kiel,

Germany: Christian-Albrechts-Universitat.

MacKinnon, D. P. (2008). Introduction to statistical mediationanalysis. Mahwah: Erlbaum.

MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004).

Confidence limits for the indirect effect: Distribution of the

product and resampling methods. Multivariate BehavioralResearch, 39, 99–128.

Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992).

Estimating population characteristics from sparse matrix samples of

item responses. Journal of Educational Measurement, 29, 133–161.

Mullis, I. V. S., Martin, M. O., Beaton, A. E., Gonzalez, E. J., Kelly,

D. L., & Smith, T. A. (1997). Mathematics achievement in theprimary school years. IEA0s third international mathematics andscience study. Chestnut Hill: Boston College.

Mullis, I. V. S., Martin, M. O., & Foy, P. (2008). TIMSS 2007International Mathematics Report. Chestnut hill: TIMSS &

PIRLS International Study Center, Lynch School of Education,

Boston College.

Muthen, L. K., & Muthen, B. (1998–2007). Mplus user’s guide.Version 5. Los Angeles: Muthen & Muthen.

NCTM (National Council of Teachers of Mathematics) (1989).

Curriculum and evaluation standards for school mathematics.

Reston, VA: NCTM.

NCTM (National Council of Teachers of Mathematics) (2000).

Principles and standards for school mathematics. Reston, VA:

NCTM.

OECD (2003). The PISA 2003 assessment framework—Mathematics,

reading, science and problem solving, knowledge and skills.

Paris: OECD.

Pesek, D. D. & Kirshner, D. (2000) Interference of instrumental

instruction in subsequent relational learning. Journal forResearch in Mathematics Education, 31(5), 524–540.

Ratzka, N. (2003). Mathematische Fahigkeiten und Fertigkeiten amEnde der Grundschulzeit. Empirische Analysen im Anschluss anTIMSS [Mathematical achievement at the end of primary school.

Empirical analyses based on TIMSS]. Hildesheim: Franzbecker.

Resnick, L. B., & Ford, W. W. (1981). The psychology ofmathematics for instruction. Hillsdale: Erlbaum.

Rittle-Johnson, B., & Alibali, M. W. (1999). Conceptual and

procedural knowledge of mathematics: Does one lead to the

other? Journal of Educational Psychology, 91, 175–189.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys.

New York: Wiley.

Sherman, J. A. (1980). Predicting Mathematics grades of high school

girls and boys: A further study. Contemporary EducationalPsychology, 5, 249–255.

Skemp, R. R. (1976). Relational understanding and instrumental

understanding. Mathematics Teaching, 77, 20–26.

Van den Heuvel-Panhuizen, M., & Buys, K. (2008). Young childrenlearn measurement and geometry. A learning–teaching trajec-tory with intermediate attainment targets for the lower grades inprimary school. Rotterdam/Tapei: Sense Publishers.

Vasilyeva, M., Casey, B. M., Dearing, E., & Ganley, C. M. (2009).

Measurement skills in low-income elementary school students:

664 J. Hannighofer et al.

123

Page 15: Revealing German primary school students’ achievement in ... · Humboldt University, Unter den Linden 6, 10099 Berlin, Germany e-mail: jasmin.hannighofer@IQB.hu-berlin.de M. Van

Exploring the nature of gender differences. Cognition andInstruction, 27(4), 401–428.

von Davier, M. (2009). Some notes on the reinvention of latent

structure models as diagnostic classification models. Measure-ment—Interdisciplinary Research and Perspectives., 7(1), 67–74.

Winkelmann, H., & van den Heuvel-Panhuizen, M. (2009). Ges-

chlechtsspezifische mathematische Kompetenzen. In D. Granzer,

O. Koller, A. Bremerich-Vos, M. van den Heuvel-Panhuizen, K.

Reiss, & G. Walther (Eds.), Bildungsstandards Deutsch undMathematik (pp. 142–156). Weinheim: Beltz Verlag.

Winkelmann, H., van den Heuvel-Panhuizen, M., & Robitzsch, A.

(2008). Gender differences in the mathematics achievements of

German primary school students: Results from a German large-

scale study. ZDM: The International Journal on MathematicsEducation, 40, 601–616.

Primary school students’ achievement in measurement 665

123