differentiated instruction in a data-based decision-making …...tion data, or parent survey data,...

22
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=nses20 School Effectiveness and School Improvement An International Journal of Research, Policy and Practice ISSN: 0924-3453 (Print) 1744-5124 (Online) Journal homepage: http://www.tandfonline.com/loi/nses20 Differentiated instruction in a data-based decision-making context Janke M. Faber, Cees A. W. Glas & Adrie J. Visscher To cite this article: Janke M. Faber, Cees A. W. Glas & Adrie J. Visscher (2018) Differentiated instruction in a data-based decision-making context, School Effectiveness and School Improvement, 29:1, 43-63, DOI: 10.1080/09243453.2017.1366342 To link to this article: https://doi.org/10.1080/09243453.2017.1366342 © 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. Published online: 17 Aug 2017. Submit your article to this journal Article views: 1092 View related articles View Crossmark data

Upload: others

Post on 04-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=nses20

School Effectiveness and School ImprovementAn International Journal of Research, Policy and Practice

ISSN: 0924-3453 (Print) 1744-5124 (Online) Journal homepage: http://www.tandfonline.com/loi/nses20

Differentiated instruction in a data-baseddecision-making context

Janke M. Faber, Cees A. W. Glas & Adrie J. Visscher

To cite this article: Janke M. Faber, Cees A. W. Glas & Adrie J. Visscher (2018) Differentiatedinstruction in a data-based decision-making context, School Effectiveness and SchoolImprovement, 29:1, 43-63, DOI: 10.1080/09243453.2017.1366342

To link to this article: https://doi.org/10.1080/09243453.2017.1366342

© 2017 The Author(s). Published by InformaUK Limited, trading as Taylor & FrancisGroup.

Published online: 17 Aug 2017.

Submit your article to this journal

Article views: 1092

View related articles

View Crossmark data

Page 2: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

ARTICLE

Differentiated instruction in a data-based decision-makingcontextJanke M. Faber a, Cees A. W. Glasb and Adrie J. Visschera

aELAN, Institute for Teacher Professionalization and School Development, Faculty of Behavioural,Management and Social Sciences, University of Twente, Enschede, The Netherlands; bDepartment ofResearch Methodology, Measurement and Data Analysis, Faculty of Behavioural Science, Management andSocial Sciences, University of Twente, Enschede, The Netherlands

ABSTRACTIn this study, the relationship between differentiated instruction,as an element of data-based decision making, and studentachievement was examined. Classroom observations (n = 144)were used to measure teachers’ differentiated instruction practicesand to predict the mathematical achievement of 2nd- and 5th-grade students (n = 953). The analysis of classroom observationdata was based on a combination of generalizability theory anditem response theory, and student achievement effects weredetermined by means of multilevel analysis. No significant positiveeffects were found for differentiated instruction practices.Furthermore, findings showed that students in low-ability groupsprofited less from differentiated instruction than students in aver-age or high-ability groups. Nevertheless, the findings, data collec-tion, and data-analysis procedures of this study contribute to thestudy of classroom observation and the measurement of differen-tiated instruction.

ARTICLE HISTORYReceived 12 March 2016Accepted 8 August 2017

KEYWORDSDifferentiated instruction;classroom observations;item response theory;multilevel regressionanalysis

Introduction

Governments expect that data-based or data-driven practices in education will improvestudent achievement (Mandinach, 2012). Systematic evaluation procedures are encour-aged by governments, and the importance of using objective and empirical data forschool improvement is emphasized (Kaufman, Graham, Picciano, Popham, & Wiley, 2014;Reis, McCoach, Little, Muller, & Kaniskan, 2011). The idea is that schools systematicallycollect and organize data, for example, student achievement data, classroom observa-tion data, or parent survey data, in order to represent aspects of school functioning(Schildkamp & Lai, 2013). In this article, we use the definition of Ikemoto and Marsh(2007) for data-based decision making (DBDM): “Teachers, principals, and administratorssystematically collecting and analyzing data to guide a range of decisions to helpimprove the success of students and schools” (p. 108).

Educational researchers study how schools implement DBDM procedures. So far,mostly qualitative research findings have indicated which conditions might foster the

CONTACT Janke M. Faber [email protected]

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT, 2018VOL. 29, NO. 1, 43–63https://doi.org/10.1080/09243453.2017.1366342

© 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License(http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in anymedium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

Page 3: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

implementation of DBDM (Blanc et al., 2010; Levin & Datnow, 2012; Schildkamp & Lai,2013; Schildkamp, Poortman, & Handelzalts, 2016; Verhaeghe, Vanhoof, Valcke, & VanPetegem, 2010; Wayman, Shaw, & Cho, 2011; Wayman, Stringfield, & Yakimowski, 2004),and which conditions hinder DBDM procedures in schools (Ehren & Swanborn, 2012;Schildkamp & Lai, 2013). Research results pertaining to the actual effect of DBDM onstudent achievement vary (Marsh, 2012). Some researchers found no effect at all(Cordray, Pion, Brandt, Molefe, & Toby, 2012), others only a significant effect for specificgroups of students (May & Robinson, 2007), and again others an overall significantimprovement of student achievement (Carlson, Borman, & Robinson, 2011;Konstantopoulos, Miller, & van der Ploeg, 2013; Van Geel, Keuning, Visscher, & Fox,2016; Van Kuijk, Deunk, Bosker, & Ritzema, 2016). Although each of these studies focusedon DBDM practices, the interventions varied (Marsh, 2012). In the study by Van Kuijket al. (2016), teachers, in addition to learning about DBDM, also learned new instruc-tional skills and knowledge for teaching reading comprehension, making it difficult tointerpret the contribution of DBDM to the effect found in the study. The studies ofCordray et al. (2012) and May and Robinson (2007) both focused on the use of assess-ment data for DBDM. In the first study, teachers used adaptive standardized testsadministrated three to four times a year, while the data fed back to teachers in thesecond study were based on a single state test. Furthermore, in some studies the wholeschool team including school principals and academic coaches were involved (Van Geelet al., 2016), while in others only one teacher per school participated (Van der Scheer,2016). In sum, we still know little about how DBDM can be best used to improve studentachievement. Interventions and their effects vary, and, as DBDM quite often is acomponent of an intervention package, it is unclear precisely which intervention com-ponent caused the observed student-achievement effects.

DBDM seems a promising approach for school development and instruction improve-ment (Carlson et al., 2011; Konstantopoulos et al., 2013; Mandinach, 2012; Van Kuijk et al.,2016); however, knowledge on how DBDM affects achievement is lacking. The aim of thisstudy was to contribute to filling that knowledge gap. Differentiated, targeted, or adaptedinstruction is a frequently mentioned aspect of classroom DBDM (Black & Wiliam, 1998;Hamilton et al., 2009), as instructional strategies are supposed to be grounded on studentdata, and adapted in line with differences in learning needs between students. We there-fore examined the relationship between differentiated instruction and student achieve-ment in a DBDM context. The Dutch Ministry of Education encourages teachers, principals,and administrators to develop their abilities to analyze, interpret, and use student data toguide improvement (Ministerie van Onderwijs, Cultuur enWetenschap, 2007). Furthermore,the Ministry facilitated several DBDM projects, among others, the Focus intervention(Staman, Visscher, & Luyten, 2014; Van Geel et al., 2016). In the Focus intervention, wholeschool teams were trained in DBDM. All teachers selected for this study participated in theFocus intervention. Before we explain the Focus intervention, we first characterize DBDMand differentiated instruction.

Theoretical framework

DBDM has become an important area of interest within the field of educational research(Hamilton et al., 2009; Mandinach, 2012; Schildkamp, Ehren, & Lai, 2012). Although the

44 J. M. FABER ET AL.

Page 4: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

concept of using data to guide teacher and school improvement is not new (Mandinach,2012), as a result of DBDM data are supposed to be collected and used by schools in amore systematic and cyclic way (Van Geel et al., 2016). Data have to be identified,collected, analyzed, and interpreted before actions can be taken (Ikemoto & Marsh, 2007;Schildkamp & Kuiper, 2010; Van der Kleij, Vermeulen, Schildkamp, & Eggen, 2015). InFigure 1, the DBDM cycle used in the Focus intervention is presented. This DBDM cyclestarts with the analysis and interpretation of student-achievement data, such thatteachers collect information on students’ progress. This information gives teachers anindication of the extent to which their instruction matches students’ needs or howeffective their teaching was (Hattie & Timperley, 2007). Based on this, teachers can setrealistic performance goals for students. In the case of clearly formulated and challen-ging goals, teachers can collect feedback that is more targeted at goal accomplishmentand can better examine the results of a new instruction strategy (Locke & Latham, 2002).In the third step, determining the instructional strategy, teachers are supposed tochoose teaching strategies matching students’ needs, based on the first and secondsteps. Teachers implement their planned teaching strategies in the classroom in thefourth step, after which the cycle starts again with the analysis of the effects of thestrategies implemented.

In DBDM, the four components of the cycle in Figure 1 are related and connected.The focus in this study, however, is on the last component: executing an instructionalstrategy. The impact of a planned instructional strategy will probably be more positive ifthe first three components have been carried out successfully. As the strategies arebased on data concerning the performance of individual students, instructional strate-gies are ideally more individualized since learning needs of students differ. When DBDMleads to an instructional strategy which adapts to students’ learning needs, and whenthis strategy is executed by teachers, student achievement might improve. However,teachers require differentiation skills to execute such an instructional strategy(Mandinach & Gummer, 2013).

Differentiated instruction

A well-known definition of differentiated instruction (DI) was provided by Tomlinsonet al. (2003): “Teachers proactively modify curricula, teaching methods, resources,

Figure 1. The components and levels of data-based decision making (source Keuning & Van Geel,2012).Note: In this study, we focused on the classroom level.

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 45

Page 5: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

learning activities, and student products to address the diverse needs of individualstudents and small groups of students to maximize the learning opportunity for eachstudent in a classroom” (p. 121). Tomlinson et al. also state that teachers shouldproactively modify teaching to address a broad range of learners’ readiness levels,interests, and modes of learning. This definition is rather broad, and Roy, Guay, andValois (2013) argue that DI is a varied and adapted teaching approach to matchstudents’ abilities, or students’ readiness levels. Tomlinson et al. and Roy et al. concep-tualized DI differently. According to Tomlinson et al., DI is proactive. Teachers plan alesson beforehand to address learner variance. Second, teachers work with flexible andsmall teaching-learning groups. Since students differ in their readiness, interests, andmodes of learning, it is important to group them in a variety of ways. Third, teachersprovide variation in learning materials, instruction time, and pace in their classrooms. Afinal important DI characteristic, according to Tomlinson et al., is the learner-centeredclassroom. In a learner-centered classroom, teachers focus on the needs of all studentsand as a result use a wide variety of instruction strategies and practices. Roy et al. usedtwo distinct DI components in their conceptualization of DI: instructional adaptation andacademic progress monitoring. The first component, instructional adaptation, overlapsto a large extent with Tomlinson et al.’s third DI characteristic (teachers provide forvariation in learning materials, instruction time, and pace in their classrooms). In additionto this, Roy et al. argue that modifying the goals and expectations for students withdifficulties is another important aspect of instructional adaptation. An important differ-ence between the two conceptualizations is that Roy et al. place greater emphasis onthe need of academic progress monitoring for instructional adaptations, whileTomlinson et al. only mention that DI should be proactive. According to Roy et al.,academic progress monitoring can be measured by the degree to which teachersevaluate the effects of their teaching adaptations, the degree to which teachers analyzedata about student progress, their use of student data for instructional decisions, andfinally whether teachers frequently assess low-performing students’ rates ofimprovement.

Two important DI characteristics emerge from these conceptualizations. First, DI isplanned, and instructional decisions should be based on the analysis of student data.Second, what makes DI observable in the classroom is the variation in learning goals,instruction content, instruction time, assignments, and learning materials aimed ataddressing varying learning needs. In the present study, we tested whether these DIcharacteristics explain student achievement.

Within-class grouping

In many cases, providing DI for each individual student is unrealistic, as it is in Dutchclassrooms, for instance, accommodating 23 students on average (Inspectie van hetOnderwijs [Inspectorate of Education], 2015). Small instruction groups are therefore oftenused to organize DI and to vary learning goals, instruction content, instruction time, assign-ments, and learning materials within relatively large classrooms. Meta-analysis findingsreveal that small-group instruction can have a positive effect on student achievement(Lou et al., 1996). The effects are, however, influenced by how the groups are composed.In homogeneous groups, students of the same ability level are grouped in one group (ability

46 J. M. FABER ET AL.

Page 6: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

grouping), whereas heterogeneous groups include students from different ability levels.Low-ability students seem to learn more in heterogeneous groups, average-ability studentslearn more in homogeneous ability groups, and for high-ability students the groupingcomposition does not make much of a difference (Lou et al., 1996; Saleh, Lazonder, & DeJong, 2005). School characteristics also influence the effects. Ability grouping has a positiveeffect in schools with a homogeneous student population, or a high-socioeconomic status(SES) student population. In schools with a low-SES student population or a heterogeneouspopulation, ability-grouping effects prove to be negative for low-ability students (Nomi,2009). These findings seem remarkable for two reasons. First, assuming that variation ininstruction is necessary since learning needs differ (Smit & Humpert, 2012; Tomlinson et al.,2003), one would expect that ability grouping is required more in classrooms in whichlearning needs differ than in classrooms in which learning needs are more similar (as in thecase of a homogeneous student population). Second, assuming that regular classroominstruction is mostly tailored to the average ability level, one would expect that especiallylow-ability and high-ability students profit from ability grouping (instead of average-abilitystudents).

There are several explanations for these findings. The first may be that students learnby giving and receiving explanations. Following this assumption, low-ability studentslearn in heterogeneous groups by receiving explanations from peers. Average-abilitystudents act to a greater extent as explanation receivers and providers in homogeneousgroups, and high-ability students learn in heterogeneous groups by being a tutor (Louet al., 1996; Saleh et al., 2005). These explanations relate to assumptions about theeffectiveness of student collaboration. Other explanations refer more specifically to DI.These explanations relate to the negative effects of DI on low-ability students. Teachersmight lower their expectations for these students (Campbell, 2014; Wiliam &Bartholomew, 2004), and more time may be spent on behavior management than oninstruction (Wiliam & Bartholomew, 2004). The time spent by a teacher on a specificgroup requires self-regulation skills from those students who are not placed in thatgroup, and especially low-ability students might find this difficult (Hong, Corter, Hong, &Pelletier, 2012). Teachers might be better equipped with curriculum materials andpedagogical skills tailored at students in the middle of the ability range, which mightexplain the positive effects on average-ability students (Hong et al., 2012).

The effectiveness of ability grouping will strongly depend on how teachers implementDI and how they organize ability grouping. It is important that teachers base the composi-tion of their instruction groups on various data resources, and not just on achievementdata (Houtveen, Booij, De Jong, & Van de Grift, 1999; Lou et al., 1996). Other characteristicsof effective grouping practices are that the groups are based on the skill being taught, thatthe grouping composition is flexible (Deunk, Doolaard, Smale-Jacobse, & Bosker, 2015;Slavin, 1987), and that learning materials (Lou et al., 1996) and learning time (Slavin, 1987)vary between ability groups. Although ability grouping has been studied frequently (Deunket al., 2015; Lou et al., 1996; Slavin, 1987), more research is needed on the effect of usingwithin-class homogeneous ability grouping on student achievement.

Hypotheses

Based on the literature, we tested the following hypotheses:

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 47

Page 7: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Hypothesis 1: Student outcomes are higher in the classrooms of teachers who differ-entiate their instruction more (observable differentiation).

Hypothesis 2: Student outcomes are higher in the classrooms of teachers who pre-planned DI more (planned differentiation).

Hypothesis 3: Students from different ability groups do not benefit to the same degreefrom a teacher who differentiates his/her instruction.

Method

In this section, we describe how a pretest-posttest observational study was used to testour hypotheses. Furthermore, we describe the nature of our sample, instruments, data-collection procedures, and the data-analysis techniques used. First, the features of theFocus intervention and the context in which the research was carried out are described.Descriptive statistics are presented in Table 1.

The focus intervention

In the intervention, entire primary school teams were trained in analyzing student data,formulating performance student learning goals, formulating instructional strategiesthat match students’ needs, and providing targeted instruction (Staman et al., 2014;Van Geel et al., 2016). More specifically, teachers learned how to analyze Cito assessmentresults (Cito is the Dutch institute for test development; see the section Standardizedassessments for a description of these assessments) by using a student monitoringsystem. Teachers learned how to design an instructional plan (which included teachers’differentiated instruction decisions) twice a year based on those assessment results.Furthermore, teachers learned how to assign students more systematically, and in a

Table 1. Descriptive statistics for students and teachers: means, standard deviations, and totalnumber of respondents.

Grade 2 Grade 5

N (missing) Mean (SD) N (missing) Mean (SD)

Teachers 25 26Multigrade group 32% 35%Students 466 487Student weight low 18.5% (1.9%) 18.5% (3.3%)Ability group

– Low107 100

– Average185 179

– High168 146

– Missing6 62

Cito May/June 2013 443 (4.9%) 46.24 (15.60) 441 (9.4%) 91.97 (11.78)Cito May/June 2014 454 (2.6%) 64.80 (14.38) 479 (1.6%) 106.68 (13.40)Differentiated instructiona 216 (3) 2.54 (0.70) 212 (1) 2.67 (0.63)Planned DIb 46 (4) 17.7 (5.92) 43 (9) 16.18 (6.66)

aThe numbers reflect the number of completed ICALT observation forms. bThe maximum score for the instructionalplan checklist was 43.

48 J. M. FABER ET AL.

Page 8: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

data-based way, to ability groups by using Cito assessments. Students are categorizedinto one of five performance categories (A: highest; E: lowest) based on the results of theCito standardized test. However, the assignment of students to class ability groupsdepends on how the students of that specific class performed on the test. Relative tothe other students in the same class, the best performing students in a class areassigned to the high-ability group by their teacher, the lowest performing students ofthat class are assigned to the low-ability group, and all other students are assigned tothe average-ability group. Ability groups are composed based on students’ test resultsand the choices teachers make based on these results. The use of Cito standardizedassessment data was the main focus of the intervention; however, teachers were alsostimulated to use other student data (e.g., their classroom observations of students) formaking instruction decisions, and not to concentrate only on the results of standardizedassessments.

The instructional plan included teachers’ differentiated instructional strategies.Instructional plans had to include three within-class homogeneous ability groups, aninstructional strategy and learning goals for each of these three groups, just as anadditional instructional approach for one or a few individual students with specificlearning needs. In the intervention, teachers divided students over three homogeneousability groups: in the “average” group students receive regular instruction, in the “low”group students require some additional instruction, and in the “highest” group studentsonly need a brief part of the regular instruction (Van der Scheer, 2016). An instructionalplan format was used to ensure all elements of the instructional plan were covered byteachers.

Trainers of the Focus intervention delivered between five and seven school teammeetings, and attended two additional meetings with the school principal in one schoolyear (the duration of the whole intervention was 2 years). During the first part of theintervention, the skills and knowledge required for executing the DBDM cycle (Figure 1)were trained. Teachers also received feedback on their teaching based on classroomobservations. The meetings with school principals were meant to support principals inmotivating teachers for DBDM. For a more detailed description of the Focus interven-tion, see Keuning, Van Geel, Visscher, Fox, and Moolenaar (2016), Staman et al. (2014),and Van Geel et al. (2016).

Sample

A fraction of all schools involved in the intervention were selected for this study. Wecontacted schools by email and also visited several schools. School principals andteachers were informed about the purpose and design of the study, and the plannedclassroom observations. Trainers of the intervention had rated each of their schools as a“weak”, “average”, or “strong” DBDM school for the school selection procedure of thisstudy. Our goal was to include 10 schools from each of these three categories to ensurevariation in DBDM practices between schools; however, not all invited schools agreed toparticipate. Twenty-six schools agreed to participate: 7 “weak”, 9 “average”, and 10“strong” DBDM schools. Most of these schools had already finished the whole interven-tion, and 6 schools were in the last half year of the intervention. Only second-gradeteachers (7/8-year-old students), and fifth-grade teachers (10/11-year-old students)

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 49

Page 9: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

participated. Second and fifth grades were selected to ensure that teachers and studentsfrom the lower and the upper grades participated. During the course of the study, oneteacher refused to be videotaped. As a result, the final sample included 26 primaryschools, 51 teachers, and 953 students (Table 1).

Nineteen percent of the students had a student weight (in The Netherlands, primaryschools receive extra funding for these students, and a child belongs to this category ofstudents if neither of both parents attained a higher qualification than lower vocationaleducation), which, compared with a national average of 11%, is rather high (CentraalBureau voor de Statistiek [CBS], 2014). Thirty-two percent of the second-grade classesand 35% of the fifth-grade classes were multigrade classes. In multigrade classes, twodifferent age groups are combined. Eighty-four percent of the teachers in second gradeand 65% in fifth grade did not teach their class alone, but had a teacher-colleague teachingthe same class during part of the week. Of each classroom, one teacher participated in thestudy (e.g., the teacher who taught mathematics most in that classroom). Since more thantwo thirds of the primary education teachers in The Netherlands have a part-time teachingjob, these numbers are not extreme (Inspectie van het Onderwijs, 2012).

Instruments and procedures

ICALTWe used the ICALT (International Comparative Analysis of Learning and Teaching)classroom observation instrument to measure teachers’ observable DI (Hypothesis 1).The ICALT is based on research literature on teaching effectiveness and was validated ininternational comparative studies. Findings indicated reliable and valid measurements ofthe six aspects of teaching included in ICALT (Van de Grift, 2007, 2014). Each aspect ofteaching included items rated on a 4-point Likert scale ranging from mainly weak tomainly strong. The following items were used:

● The teacher evaluates whether the lesson objectives have been achieved at theend of the lesson.

● The teacher offers extra learning and instruction time to struggling learners.● The teacher adapts his/her instructional activities to relevant differences between

students.● The teacher adapts the assignments to relevant differences between students.

Since the first item lowered the scale’s internal consistency, this item was excluded fromthe further analyses. Cronbach’s alpha for the remaining items was α = 0.73.

Generalizability studies have shown that reliable teacher ratings, on the basis of lessonobservations, require two or more observed lessons per teacher in combination with twoor more raters (Hill, Charalambous, & Kraft, 2012). Therefore, three mathematics lessons ofeach teacher were taped and afterwards rated by three trained raters. Recordings werespread over a period ranging from November 2013 to May 2014. Teachers were asked togive an entire regular mathematics lesson lasting between 45 and 60 min. We used the IRISConnect toolkit for recording the lessons: A systemwith twomobile devices simultaneouslyrecorded the teacher and the students. Recordings were uploaded to a secured, onlineenvironment. Principals of participating schools decided if and how parents were

50 J. M. FABER ET AL.

Page 10: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

requested for permission, and children of parents who did not provide permission were notrecorded. Raters followed a 3-day training course, and rated six observations indepen-dently of each other during this course. Afterwards, rater variation was discussed and ratingguidelines were developed to maximize consensus between raters. Raters agreed to watchall taped lessons randomly, to prevent order-based bias. Due to changes in teacher teams(i.e., maternity leave, illness) or planning problems, not all participating teachers wererecorded three times. A total of 144 lessons were recorded (nine teachers with one lessonmissing) and scored by each rater. Of nine teachers, their DI scores were computed withdata from two lesson observations instead of three.

Instructional plan checklistTeachers are supposed to plan DI in advance in order to address learner variance(Tomlinson et al., 2003). We collected teachers’ instructional plans to measure theirplanned DI (Hypothesis 2). Teachers in the intervention learned to develop two instruc-tional plans based on the standardized assessments in January/February and May/June(see section Standardized assessments). Of each classroom, all mathematics instructionalplans covering the same period as the observations were collected. We collected 46second-grade plans (4 missing) and 43 fifth-grade plans (9 missing). To measure plannedDI, a checklist was developed to evaluate the instructional plans. Prototypes of thechecklist were tested by Focus trainers to make sure all elements of the instructionalplan as learned during the intervention were included. Since teachers learned to use aformat, teachers’ instructional plans were very similar. The checklist consisted of 43items to measure the degree to which teachers vary the following three topics betweenability groups:

(1) instruction (e.g., learning materials, learning pace);(2) learning goals (e.g., the specification of a percentage of minimally required correct

answers on the mathematics test);(3) evaluation (e.g., the specification of follow-up actions in case learning goals were

not yet accomplished).

Furthermore, the degree to which teachers specified the instruction for students withspecific learning needs was measured with this checklist (see Appendix 1). Two ratersevaluated all instructional plans and scored each item in the checklist as “present” or“not present” (intraclass correlation coefficient [ICC] = 0.63, two-way mixed model,absolute agreement, IBM SPSS Statistics Version 22). Aggregated total checklist scoreswere included in the analyses (i.e., the average of four checklist scores: two instructionalplans, each scored by two raters).

Standardized assessmentsStudents’ results on the Cito standardized mathematic tests were used for the depen-dent variable, student achievement. Most Dutch schools use these tests for the wholeprimary school period (all grades). The results from different grades can be placed onone and the same ability scale. The Cito mathematic tests measure three differentdomains: (a) arithmetic; (b) proportions, fractions, and percentages; and (c) geometry,time, and money calculations. Students take the tests twice a year: in January or

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 51

Page 11: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

February, and in May or June. For the dependent variable, the standardized test resultsof May or June 2014 were used, whereas the standardized results of the May or June2013 test were used as a covariate (pretest). The percentage of missing data was 7.3%for the pretest and 2.1% for the posttest. Within-group regression, using the group meanand the other test score as predictors was used as an imputation method. So, aregression model was estimated based on all complete cases, using the group meansand all available scores, and based on this regression model, the missing values wereestimated and imputed.

Analysis

Since students are nested within teachers’ classes, a multilevel model was used to testthe hypothesis regarding the relation between students’ achievement scores and tea-chers’ differentiated instruction skills. DI and planned DI are classroom or group-levelvariables; all other variables included in the analysis were measured at the student level.The Level 1 model is given by:

Yij ¼ β0j þ β1X1ij þ β2X2ij þ β3X3ij þ β4X4ij þ β5X5ij þ Rij; (1)

where Yij represents the standardized posttest score for student i of teacher j. The fivecovariates are the 2013 score, Gender, the Student Weight defined in the Samplesection, and two dummy codes for Ability Group (high and low), respectively. Notethat Rij is a residual and that the intercept β0j is random over teachers. The model for

this random coefficient is given by:

β0j ¼ γ0 þ γ1θj þ γ2Z2j þ γ3Z3j þ U0j (2)

where Z2j and Z3j are teacher-level variables Grade and Aggregated Planned DI scores,respectively. U0j is the Level 2 residual. Finally, θj stands for the latent variableDifferentiated Instruction (DI). So U0j and Rij are the deviations from the average scorefor teachers and the deviations of students from the teachers’ average, given thecovariates at the two levels (Luyten & Sammons, 2010; Snijders & Bosker, 1999).

This latent variable was analyzed with a combined item response theory (IRT) model(Lord, 1980), and a generalizability theory (GT) model (Brennan, 1992). All teachers wererated by raters (indexed r) using the same items (indexed k) at all time points (indexed t).The items were scored on a Likert-point scale with categories indexed h (h = 0, . . ., 3).The IRT model was the generalized partial credit model. The function of the IRT modelwas to map the discrete item responses to a continuous overall measure, that is, to alatent variable. For teacher j at a time point k, the probability of a score in category h ofitem k denoted by Ujtrk ¼ h is given by:

P Ujtrk ¼ h� � ¼ exp hαkθjtr � δkh

� �

1þ P3g¼1exp gαkθjtr � δkg

� � ; h ¼ 0; 1; 2; 3; (3)

where θjtr is the position on the latent variable of teacher j on time point t as judged byrater r. Note that the responses were recoded from a scale that ran from 1 to 4, to a scalefrom 0 to 3. Further, in Equation (3), αk and δkh are the parameters of item k; αk is anitem discrimination parameter that gauges the relation between the observed score and

52 J. M. FABER ET AL.

Page 12: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

the latent scale, and δkh (h = 1, . . ., 3) represent locations on the latent scale, that is, theygauge the salience of the item. A GT model was imposed on the latent variables θjtr ,that is:

θjtr ¼ θj þ τ1t þ τ2r þ τ3jt þ τ4jr þ τ5tr þ εjtr (4)

Note that θj is the main effect for the teacher, τ1t and τ2r are the main effects for thetime points and raters, and the other terms are the two-way interaction effects and aresidual.

The complete model, given by the Equations (1) to (4), was estimated in a Bayesianframework using OpenBUGS (Version 3.2.3. rev. 2012). We built the model from anempty model with only the standardized posttest scores to the final model that alsoincluded the interaction effect between the ability-group division measured at thestudent level, and the latent DI scores measured at the group level. Besides testingthe hypotheses using the multilevel model, the rater reliability was also of interest. Thisreliability was estimated using the variance decomposition of the GT model; thisprocedure is generally known as a generalizability study (Brennan, 1992). The reliabilitycoefficient is given by:

ρ ¼ σ2j

σ2j þσ2jt3 þ σ2jr

3 þ σ2e9

; (5)

where σ2j is the variance of teachers, σ2jt and σ2jr are the variances associated with theinteraction of teachers with time points and raters, and σ2e is the error variance.

A Bayesian estimation as implemented in OpenBugs is an iterative process based on aMarkov chain Monte Carlo (MCMC) method. All parameters were estimated with 4,000burn-inn iterations and 16,000 effective iterations. The estimation procedures wererepeated several times, and the Monte Carlo standard error for all parameters wasalways well below 5%. Standard prior distributions were used for all parameters; thatis, means and regression parameters had normal priors with mean zero and low preci-sions (0.25), the inverses of variance parameters had uninformative Gamma distributions,the item discrimination parameters had normal priors with a mean of 1.0, and a varianceof 1.0, truncated to the positive domain, and the item location parameters had standardnormal priors. Furthermore, Pearson’s (frequentist) correlation coefficients were com-puted using aggregated DI scores, aggregated planned DI scores, and aggregated anddisaggregated student pretest and posttest scores.

Results

We first present the results of the estimation of the GT model to assess rater reliabilityand obtain the latent teacher variables needed for the multilevel model. Table 2 pre-sents the variance components in classroom observations data for the DI scale neededto compute rater reliability. Teachers’ variance was set to 1.00 to identify the scale. It isimportant to note that analyses with latent variables require an identification of thelocation and the scale. The scale is identified using the teacher’s variance. This is donefor convenience; that is, it is the largest variance, but any other choice would haveproduced exactly the same results since all variances would have been multiplied by the

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 53

Page 13: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

same constant, and their ratio would not change. The percentages in the table representthe percentages of variance explained by the specific component compared to the totalvariance. From the percentages, it can be seen that almost 37% of the variance isexplained by differences between teachers. This result indicated that most of thevariance in the scale was due to differences between teachers. Furthermore, the differ-ences between the three observation moments explained 19.33% of the variance.Compared to the other variance components, this is high. The proportion of varianceexplained by differences between raters was much smaller (7.67%), indicating that raterspredominantly agreed upon the rankings of the teachers. This consistency in ratings isalso reflected by the reliability coefficient, this coefficient was 0.83 (SD = 0.02) for DI. Theinfluence of different observation moments is reflected by the interaction betweenteachers and time moments, which shows the extent to which the ordering of teachersover different time moments explains variance. This interaction explains 12.72% of thevariance.

Table 3 shows the Pearson’s correlation coefficient between explanatory variablesand the dependent variable. Aggregated teachers’ DI and planned DI scores were usedfor computing the correlations. DI has significant positive correlations with the pretestscores (r DI pretest = 0.19, p < 0.01), and the posttest scores (r DI posttest = 0.17,p < 0.01). Interestingly, the correlations between planned DI and achievement werenegative and significant (r pretest = −0.13, p < 0.01 and r posttest = −0.15, p < 0.01).Also, the small and nonsignificant correlation between DI and planned DI (r = 0.02, ns) isremarkable, since it seems reasonable that planning DI is related to executing DIpractices in the classroom. The same patterns are found in the correlations with

Table 2. Variance components observation data.DI

Component Name Variance SD Percentage

σj Teachers 1.00 0 36.84σt Time 0.52 0.46 19.33σr Rater 0.21 0.04 7.67σjt Teachers x Time 0.35 0.07 12.72σjr Teachers x Rater 0.19 0.03 7.05σtr Time x Rater 0.20 0.04 7.25σe (error) 0.25 0.05 9.14Total 2.71 100.00

Note: SD stands for posterior standard deviation, which can be interpreted as a standard error.

Table 3. Correlations.DI Planned DI Pretest Posttest

DI .02 .19*0.22*

.17*0.20*

Planned DI –.13*–0.16*

–.15*–0.18*

Pretest .92*0.98*

Posttest

Note: Scores below diagonal correlations with aggregated (classroom level) student pre- and posttest scores.*Correlation is significant at the 0.01 level (2-tailed).

54 J. M. FABER ET AL.

Page 14: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

aggregated student test scores (classroom level); only, these correlations are somewhathigher (with regard to DI) or lower (with regard to planned DI).

The results of the multilevel analysis, as shown in Table 4, show that in the emptymodel a large proportion of the variance is group-level variance (variance = 0.87,SD = 0.19), leading to a high ICC of 0.69. This high proportion of group-level varianceis caused by combining the assessment results from two different grades (second- andfifth-grade students) into the model. The ICC drops in Model 1 to 0.13, after includingstudent grade and student pretest scores. This result indicates that, as expected, mostvariance is due to differences between students and to a lesser degree due to differ-ences between groups.

Covariates were included in Model 1. Significant covariates were gender (β = −0.08,p < 0.05; significance indicates that the value zero is outside the 1% Bayesian credibilityregion), grade (β = 0.39, p < 0.05), and pretest (β = 0.76, p < 0.05). Similar effects forgender were also found in the next models, which indicates that boys’ mathematicachievement was significantly higher than girls’ mathematic achievement. As expected,the achievements of students in second grade were lower than the achievements ofstudents in fifth grade. Furthermore, students with a high pretest scored higher on theposttest than students with a lower pretest score. No significant effects of studentweight were found.

Explanatory variables were included in subsequent models. In Model 2, standardizedplanned DI scores were added. No significant positive effects were found for planned DI.This finding does not support Hypothesis 2: Students’ outcomes are higher in theclassrooms of teachers who planned DI more. In Model 3, the latent DI observationscores were included. Again, no significant positive effects were found. This finding doesnot support Hypothesis 1: Students’ outcomes are higher in the classrooms of teacherswho differentiate their instruction more.

Table 4. Multilevel analysis results.Model 0 Model 1 Model 2 Model 3 Model 4 Model 5

Predictors Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD

FixedIntercept –0.13 0.13 1.01* 0.20 1.05* 0.19 1.04* 0.20 –0.40 0.21 –0.37 0.22Gender (1 = girl) –0.08* 0.03 –0.08* 0.03 –0.08* 0.03 –0.08* 0.03 –0.07* 0.03Student weight (0 = noweight)

–0.02 0.04 –0.02 0.04 –0.02 0.04 0.00 0.04 0.03 0.04

Grade (0 = second) 0.39* 0.08 0.38* 0.08 0.38* 0.08 0.91* 0.08 0.89* 0.09Pretest 0.76* 0.03 0.75* 0.03 0.76* 0.03 0.47* 0.03 0.48* 0.03Planned DI –0.05 0.03 –0.05 0.03 –0.05* 0.03 –0.05* 0.03Differentiated instruction (DI) –0.02 0.05 0.00 0.05 0.05 0.06Ability groupLow –0.24* 0.04 –0.22* 0.05High 0.41* 0.04 0.41* 0.04Ability group * DILow * DI –0.20* 0.09High * DI –0.03 0.06RandomVariance student level 0.37 0.02 0.20 0.01 0.20 0.01 0.20 0.01 0.17 0.01 0.16 0.01Variance group level 0.87 0.19 0.03 0.01 0.03 0.01 0.03 0.01 0.03 0.01 0.03 0.01ICC 0.69 0.04 0.13 0.03 0.12 0.03 0.12 0.03 0.14 0.04 0.15 0.04

Note: SD stands for posterior standard deviation, which can be interpreted as a standard error.

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 55

Page 15: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Ability-group and interaction effects were added in Models 4 and 5. As expected,students in low-ability groups had significant lower achievement scores than students inaverage-ability groups (β = −0.24, p < 0.05). Furthermore, students in high-ability groupshad significantly higher achievement scores than students in average-ability groups(β = 0.41, p < 0.05). In Model 5, the interaction between ability group and DI wasincluded. From the results, it follows that students in low-ability groups whose teachershad high DI scores had significantly lower posttest scores than students in average-ability and high-ability groups with teachers who also had high DI scores (β = −0.22,p < 0.05). This finding supports Hypothesis 3: Students from different ability groups donot benefit equally from teachers who differentiate their instruction. Our results suggestthat, compared with students in high- or average-ability groups, students in low-abilitygroups profit significantly less from a teacher with high DI observation scores. Nosignificant results were found for the interaction between high-ability-group studentsand DI.

Discussion and conclusions

DBDM-intervention effects on student achievement have been examined in severalresearch projects (Carlson et al., 2011; Cordray et al., 2012; Konstantopoulos et al.,2013; May & Robinson, 2007; Van Geel et al., 2016; Van Kuijk et al., 2016); however,our knowledge of how DBDM affects student achievement is still very limited. Thepurpose of the present study was to investigate the relationship between DI and studentachievement in a DBDM context.

First, the findings of the generalizability study showed that, even though mostvariance was explained by differences between teachers, there was much variabilitybetween the lessons of the same teacher. These observation time effects were alsofound in a study by Praetorius, Pauli, Reusser, Rakoczy, and Klieme (2014), and suchfindings indicate that more research is needed on how valid and representative teacherobservation scores can be obtained. Furthermore, our findings indicated that studentsfrom different ability groups do not profit from DI to the same extent. This finding is inline with previous research: Ability grouping can have a negative impact on theachievement of students in low-ability groups, ability grouping is effective for studentsin average-ability groups, and ability grouping has no impact on the achievement ofstudents in high-ability groups (Lou et al., 1996; Saleh et al., 2005). In future research, itwould be worth investigating whether lower teacher expectations, less stimulatinglearning materials, and a lack of self-regulation skills among low-performing students(Campbell, 2014; Hong et al., 2012; Nomi, 2009; Wiliam & Bartholomew, 2004) couldexplain the negative impact of DI on the achievement of students in low-ability groups.Furthermore, we expected that students taught by teachers who differentiate theirinstruction more, or by teachers who plan DI more, have higher student achievementlevels. No such positive effects were found. A reverse causality between DI and studentachievement (i.e., DI practices are executed more in classrooms with many low-perform-ing students and a very diverse student population) might be an explanation for thisfinding (De Neve & Devos, 2016; Nomi, 2009). Another explanation might be the impactof DI on noncognitive outcomes such as students’ feelings of competence (Carver &Scheier, 1990). Especially for students in low-ability groups, there might have been an

56 J. M. FABER ET AL.

Page 16: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

impact on noncognitive outcomes, and consequently on student achievement. Also,these findings may suggest that planning differentiation strategies in advance shouldalways be combined with responsive ad hoc classroom differentiation practices. It maybe that a balance between preplanned instruction and responsive teaching is mosteffective (Sawyer, 2004). In future studies, such effects should be studied to explainbetter how DBDM affects student achievement.

Even though no relation between DI (as a preplanned teaching approach) andachievement was found in the present study, the findings still contribute to the datacollection procedures that can be used to measure DI. DI cannot be measured withobservations alone, as it is necessary to know the rationale behind the differentiationapproaches observed (Allen, Matthews, & Parsons, 2013). Variation in learning material,instruction time, and assignments between students is easily observed by means ofclassroom observations; however, it is difficult for the rater to judge whether thisobserved differentiation in instructional activities matches the instructional needs ofstudents. The findings of this study indicate that teachers’ differentiation practices inclassrooms and their preplanned differentiation practices on paper are not alwaysrelated. Therefore, we recommend that in future DI effectiveness studies other measuresaimed at determining the rationale behind differentiation activities will be used to assessthe fit between DI and actual instructional needs. For example, to determine therationale behind the observed DI and examine whether the observed DI indeedmatched students’ needs, students and teachers could be interviewed immediatelyafter classroom observations. Research findings have already indicated that studentsare able to judge teachers’ behavior, as students’ perceptions of teacher behaviorproved to be good predictors for student outcomes (Maulana, Helms-Lorenz, & Van deGrift, 2015).

Furthermore, the data analysis procedures used in this study contribute to theexisting knowledge base. They solve many problems of studies with observationsmade by multiple raters, on multiple time points, using itemized observation instru-ments. First of all, generalizability as such is a tool for disentangling the variancecomponents in classroom observation data, and for making an informed choice regard-ing the indices of reliability and agreement that best fit the purpose of the observation.However, traditionally, the model is imposed on directly observed sum scores. Thisignores measurement error at the item level. Using an IRT model as a measurementerror model can reduce bias, for instance, caused by floor and ceiling effects.Furthermore, IRT models are much more flexible in handling missing item responsesand better suited for optimization of measurement designs (e.g., by optimal itemselection). Another aspect of the innovative approach concerns the Bayesian frameworkcombined with software such as OpenBugs. The advantage here is that complicated andrelatively unique models without dedicated software can be built and estimated in arelatively simple way, using scripts that are both transparent and easily shared withother researchers. The conclusion is that the combination of IRT and generalizabilitytheory is a worthwhile and recommendable methodology for other studies involvingvariables measured at different levels (students, classrooms), with observations made bydifferent raters at different time points, and using itemized measurement instruments.

Aside from the above-mentioned contributions, the present study also has someshortcomings. One of them is that the relationship between DBDM and DI was not

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 57

Page 17: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

examined. Based on the DBDM literature, it was assumed that DBDM could result inmore data-based DI practices in classrooms and that, if this is the case, student achieve-ment would consequently improve. If DBDM does not result in more data-based DIpractices, then DI does not explain (potential) student achievement growth. So, ourfindings would have contributed more to our understanding of how DBDM influencesachievement, if the relationship between DBDM and DI could also have been examined.In addition to this, the Focus intervention was based on the DBDM literature and not onthe DI literature. As a result, some effective DI practices unfortunately were not includedin the intervention. Ability-grouping effects are, for example, stronger if student group-ing is based on mixed data sources like classroom observations, student interviews, andachievement data (Lou et al., 1996), whereas teachers in this study were trained toparticularly use achievement data for their group compositions. Differentiated instruc-tion requires more information than students’ results on standardized assessments(National Research Council, 2001). One can think of more qualitative informationwhich teachers collect on a daily basis in the classroom and of the results of diagnostictests as examples of information sources that can be used for matching instruction withwhat students need. Furthermore, ability-grouping effects are stronger if group compo-sition is flexible and students do not always stay in the same group (Deunk et al., 2015;Slavin, 1987), whereas teachers in this study were trained to compose ability groups onlyonce every half year in their instructional plan. However, teachers in the interventionwere intensively trained at applying other characteristics of effective ability grouping,like composing groups on the basis of specific subject-matter topics and also to varylearning materials and instruction time between ability groups (Deunk et al., 2015; Louet al., 1996; Slavin, 1987). A second shortcoming of the present study is that we were notable to account for teachers’ colleagues effects. Two (or sometimes even three) teacherssharing classes is quite common in Dutch primary education, due to the high percen-tage of part-time jobs. Even though teachers who taught mathematics most were askedto participate, and whole school teams were trained, the high percentage of teachercolleagues in both grades still impacted the findings. It is important to take account ofthis in the interpretation of the study results. A third shortcoming is that teachers’ DIpractices observed in the classroom were measured with three items (variation betweenstudents’ learning time, instruction activities, and students’ assignments), and moreitems will be needed to obtain a more validated measure of DI. Another limitation ofthis study is the validity of the instructional plan checklist. Low and negative correlationsbetween planned DI and DI practices and pretest and posttest scores do not contributeto such validity, so the findings regarding the planning of DI should be interpretedcarefully. However, the low correlation between planned DI and DI practices might alsobe the result of the fact that most participating schools already had finished the Focusintervention. Perhaps the teachers of those schools were, without the support of thetrainer, no longer motivated to develop the instructional plan in the way they hadlearned during the intervention.

Based on the high numbers of classroom observations and raters, in combinationwith data analysis procedures based on the generalizability theory and the itemresponse theory, our study revealed some valuable insights. Overall, the present studyenhances our understanding of DI. The expected variables to be responsible for anachievement effect in a DBDM context were not confirmed by our findings, as no effects

58 J. M. FABER ET AL.

Page 18: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

of planned DI and DI practices executed in the classroom were found. However, thisstudy contributed to our knowledge of the effectiveness of ability grouping and pointsto the importance of further research into why DI and ability grouping seems to be leasteffective for students in low-ability groups. Based on our findings, we recommend validmeasures of DI practices in classrooms, such that not only capture the variation ininstruction but also the degree of appropriateness of observed instructional variationfor students’ instructional needs.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes on contributors

Janke M. Faber is a PhD student at the University of Twente. Her research focuses on theevaluation of various manifestations of data-based decision making.Cees A. W. Glas is professor at the University of Twente. His research focuses on the developmentand application of statistical models in educational measurement, psychological testing, and large-scale surveys.Adrie J. Visscher is full professor at the University of Twente and also holds an endowed chair indata-based decision making at the University of Groningen. His research focuses on feedbackabout the features of teaching activities and feedback about teachers’ impact on studentachievement.

ORCID

Janke M. Faber http://orcid.org/0000-0001-8127-6831

References

Allen, M. H., Matthews, C. E., & Parsons, S. A. (2013). A second-grade teacher’s adaptive teachingduring an integrated science-literacy unit. Teaching and Teacher Education, 35, 114–125.doi:10.1016/j.tate.2013.06.002

Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assess-ment. Phi Delta Kappan, 80, 139–148.

Blanc, S., Christman, J. B., Liu, R., Mitchell, C., Travers, E., & Bulkley, K. E. (2010). Learning to learnfrom data: Benchmarks and instructional communities. Peabody Journal of Education, 85, 205–225. doi:10.1080/01619561003685379

Brennan, R. L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11(4),27–34. doi:10.1111/j.1745-3992.1992.tb00260.x

Campbell, T. (2014). Stratified at seven: In-class ability grouping and the relative age effect. BritishEducational Research Journal, 40, 749–771. doi:10.1002/berj.3127

Carlson, D., Borman, G. D., & Robinson, M. (2011). A multistate district-level cluster randomized trialof the impact of data-driven reform on reading and mathematics achievement. EducationalEvaluation and Policy Analysis, 33, 378–398. doi:10.3102/0162373711412765

Carver, C. S., & Scheier, M. F. (1990). Origins and functions of positive and negative affect: A controlprocess view. Psychological Review, 97, 19–35.

Centraal Bureau voor de Statistiek. (2014). Onderwijs cijfers [Education statistics]. Retrieved fromhttp://www.cbs.nl/nl-NL/menu/themas/onderwijs/cijfers/default.htm

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 59

Page 19: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Cordray, D., Pion, G., Brandt, C., Molefe, A., & Toby, M. (2012). The impact of the Measures ofAcademic Progress (MAP) program on student reading achievement. Retrieved from files.eric.ed.gov/fulltext/ED537982.pdf

De Neve, D., & Devos, G. (2016). The role of environmental factors in beginning teachers’ profes-sional learning related to differentiated instruction. School Effectiveness and School Improvement,27, 357–379. doi:10.1080/09243453.2015.1122637

Deunk, M., Doolaard, S., Smale-Jacobse, A., & Bosker, R. J. (2015). Differentiation within and across class-rooms: A systematic review of studies into the cognitive effects of differentiation practices. Groningen, TheNetherlands: GION onderwijs/onderzoek. Retrieved from http://www.nro.nl/wp-content/uploads/2015/03/Roel-Bosker-Effectief-omgaan-met-verschillen-in-het-onderwijs-review.pdf

Ehren, M. C. M., & Swanborn, M. S. L. (2012). Strategic data use of schools in accountability systems.School Effectiveness and School Improvement, 23, 257–280. doi:10.1080/09243453.2011.652127

Hamilton, L., Halverson, R., Jackson, S. S., Mandinach, E., Supovitz, J. A., Wayman, J. C., . . . Steele, J.L. (2009). Using student achievement data to support instructional decision making. Retrievedfrom http://repository.upenn.edu/gse_pubs/279

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77, 81–112. doi:10.3102/003465430298487

Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacherobservation systems and a case for the generalizability study. Educational Researcher, 41, 56–64.doi:10.3102/0013189X12437203

Hong, G., Corter, C., Hong, Y., & Pelletier, J. (2012). Differential effects of literacy instruction timeand homogeneous ability grouping in kindergarten classrooms: Who will benefit? Who willsuffer? Educational Evaluation and Policy Analysis, 34, 69–88. doi:10.3102/0162373711424206

Houtveen, A. A. M., Booij, N., De Jong, R., & Van de Grift, W. J. C. M. (1999). Adaptive instruction andpupil achievement. School Effectiveness and School Improvement, 10, 172–192. doi:10.1076/sesi.10.2.172.3508

Ikemoto, G. S., & Marsh, J. A. (2007). Cutting through the “data-driven” mantra: Different concep-tions of data-driven decision making. Evidence and Decision Making: Yearbook of the NationalSocieity of Education, 106, 105–131. doi:10.1111/j.1744-7984.2007.00099.x

Inspectie van het Onderwijs. (2012). De staat van het onderwijs: Onderwijsverslag 2010/2011 [Thestate of education in The Netherlands: Annual report 2010/2011]. Utrecht, The Netherlands:Author. Retrieved from http://www.onderwijsinspectie.nl

Inspectie van het Onderwijs. (2015). De staat van het onderwijs: Onderwijsverslag 2013/2014 [Thestate of education in The Netherlands: Annual report 2013/2014]. Utrecht, The Netherlands:Author. Retrieved from http://www.onderwijsinspectie.nl

Kaufman, T. E., Graham, C. R., Picciano, A. G., Popham, J. A., & Wiley, D. (2014). Data-driven decisionmaking in the K-12 classroom. In J. M. Spector, M. D. Merrill, J. Elen, & M. J. Bishop (Eds.),Handbook of research on educational communications and technology (pp. 337–346). New York,NY: Springer. doi:10.1007/978-1-4614-3185-5

Keuning, T., & Van Geel, M. J. M. (2012, November). Focus projects II and III. The effects of a trainingin “achievement oriented work” for primary school teams. Poster presented at the InternationalICO Fall School, Girona, Spain.

Keuning, T., Van Geel, M., Visscher, A. J., Fox, J.-P., & Moolenaar, N. M. (2016). The transformation ofschools’ social networks during a data-based decision making reform. Teachers College Record,118(9), 1–33.

Konstantopoulos, S., Miller, S. R., & van der Ploeg, A. (2013). The impact of Indiana’s system ofinterim assessments on mathematics and reading achievement. Educational Evaluation andPolicy Analysis, 35, 481–499. doi:10.3102/0162373713498930

Levin, J. A., & Datnow, A. (2012). The principal role in data-driven decision making: Using case-study data to develop multi-mediator models of educational reform. School Effectiveness andSchool Improvement, 23, 179–201. doi:10.1080/09243453.2011.599394

Locke, E. A., & Latham, G. P. (2002). Building a practically useful theory of goal setting and taskmotivation: A 35-year odyssey. American Psychologist, 57, 705–717. doi:10.1037/0003-066X.57.9.705

60 J. M. FABER ET AL.

Page 20: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ:Erlbaum.

Lou, Y., Abrami, P. C., Spence, J. C., Poulsen, C., Chambers, B., & Apollonia, S. (1996). Within-classgrouping: A meta-analysis. Review of Educational Research, 66, 423–458. doi:10.3102/00346543066004423

Luyten, H., & Sammons, P. (2010). Multilevel modelling. In B. P. M. Creemers, L. Kyriakides, & P.Sammons (Eds.), Methodological advances in educational effectiveness research (pp. 246–276).Abingdon, UK: Routledge.

Mandinach, E. B. (2012). A perfect time for data use: Using data-driven decision making to informpractice. Educational Psychologist, 47, 71–85. doi:10.1080/00461520.2012.667064

Mandinach, E. B., & Gummer, E. S. (2013). A systemic view of implementing data literacy ineducator preparation. Educational Researcher, 42, 30–37. doi:10.3102/0013189X12459803

Marsh, J. A. (2012). Interventions promoting educators’ use of data: Research insights and gaps.Teachers College Record, 114(11), 1–48.

Maulana, R., Helms-Lorenz, M., & Van de Grift, W. (2015). Development and evaluation of aquestionnaire measuring pre-service teachers’ teaching behaviour: A Rasch modellingapproach. School Effectiveness and School Improvement, 26, 169–194. doi:10.1080/09243453.2014.939198

May, H., & Robinson, M. A. (2007). A randomized evaluation of Ohio’s Personalized AssessmentReporting System (PARS). Retrieved from Consortium for Policy Research in Education websitehttp://www.cpre.org/randomized-evaluation-ohios-personalized-assessment-reporting-system-pars

Ministerie van Onderwijs, Cultuur en Wetenschap. (2007). Scholen voor morgen: Samen op wegnaar duurzame kwaliteit in het primair onderwijs [Tomorrow schools: Together on the way tosustainable quality in primary education]. Den Haag, The Netherlands: Author. Retrieved fromhttps://www.rijksoverheid.nl

National Research Council. (2001). Knowing what students know: The science and design of educa-tional assessment. Washington, DC: The National Academies Press.

Nomi, T. (2009). The effects of within-class ability grouping on academic achievement in earlyelementary years. Journal of Research on Educational Effectiveness, 3, 56–92. doi:10.1080/19345740903277601

Praetorius, A.-K., Pauli, C., Reusser, K., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need?Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12. doi:10.1016/j.learninstruc.2013.12.002

Reis, S. M., McCoach, D. B., Little, C. A., Muller, L. M., & Kaniskan, R. B. (2011). The effects ofdifferentiated instruction and enrichment pedagogy on reading achievement in five elementaryschools. American Educational Research Journal, 48, 462–501. doi:10.3102/0002831210382891

Roy, A., Guay, F., & Valois, P. (2013). Teaching to address diverse learning needs: Development andvalidation of a differentiated instruction scale. International Journal of Inclusive Education, 17,1186–1204. doi:10.1080/13603116.2012.743604

Saleh, M., Lazonder, A. W., & De Jong, T. (2005). Effects of within-class ability grouping on socialinteraction, achievement, and motivation. Instructional Science, 33, 105–119. doi:10.1007/s11251-004-6405-z

Sawyer, R. K. (2004). Creative teaching: Collaborative discussion as disciplined improvisation.Educational Researcher, 33(2), 12–20. doi:10.3102/0013189X033002012

Schildkamp, K., Ehren, M., & Lai, M. K. (2012). Editorial article for the special issue on data-baseddecision making around the world: From policy to practice to results. School Effectiveness andSchool Improvement, 23, 123–131. doi:10.1080/09243453.2011.652122

Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes,and promoting and hindering factors. Teaching and Teacher Education, 26, 482–496.doi:10.1016/j.tate.2009.06.007

Schildkamp, K., & Lai, M. K. (2013). Data-based decision making: Conclusions and a data useframework. In K. Schildkamp, M. K. Lai, & L. Earl (Eds.), Data-based decision making in education:

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 61

Page 21: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

Challenges and opportunities (pp. 177–191). Dordrecht, The Netherlands: Springer. doi:10.1007/978-94-007-4816-3

Schildkamp, K., Poortman, C. L., & Handelzalts, A. (2016). Data teams for school improvement.School Effectiveness and School Improvement, 27, 228–254. doi:10.1080/09243453.2015.1056192

Slavin, R. E. (1987). Ability grouping and student achievement in elementary schools: A best-evidence synthesis. Review of Educational Research, 57, 293–336. doi:10.3102/00346543057003293

Smit, R., & Humpert, W. (2012). Differentiated instruction in small schools. Teaching and TeacherEducation, 28, 1152–1162. doi:10.1016/j.tate.2012.07.003

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advancedmultilevel modeling. London, UK: Sage.

Staman, L., Visscher, A. J., & Luyten, H. (2014). The effects of professional development on theattitudes, knowledge and skills for data-driven decision making. Studies in EducationalEvaluation, 42, 79–90. doi:10.1016/j.stueduc.2013.11.002

Tomlinson, C. A., Brighton, C., Hertberg, H., Callahan, C. M., Moon, T. R., Brimijoin, K., . . . Reynolds, T.(2003). Differentiating instruction in response to student readiness, interest, and learning profilein academically diverse classrooms: A review of literature. Journal for the Education of the Gifted,27, 119–145. doi:10.1177/016235320302700203

Van de Grift, W. (2007). Quality of teaching in four European countries: A review of the literatureand application of an assessment instrument. Educational Research, 49, 127–152. doi:10.1080/00131880701369651

Van de Grift, W. J. C. M. (2014). Measuring teaching quality in several European countries. SchoolEffectiveness and School Improvement, 25, 295–311. doi:10.1080/09243453.2013.794845

Van der Kleij, F. M., Vermeulen, J. A., Schildkamp, K., & Eggen, T. J. H. M. (2015). Integrating data-based decision making, assessment for learning and diagnostic testing in formative assessment.Assessment in Education: Principles, Policy & Practice, 22, 324–343. doi:10.1080/0969594X.2014.999024

Van der Scheer, E. A. (2016). Data-based decision making put to the test (Doctoral dissertation).Retrieved from http://doc.utwente.nl/101229/1/thesis_E_van_der_Scheer.pdf

Van Geel, M., Keuning, T., Visscher, A. J., & Fox, J.-P. (2016). Assessing the effects of a school-widedata-based decision-making intervention on student achievement growth in primary schools.American Educational Research Journal, 53, 360–394. doi:10.3102/0002831216637346

Van Kuijk, M. F., Deunk, M. I., Bosker, R. J., & Ritzema, E. S. (2016). Goals, data use, and instruction:The effect of a teacher professional development program on reading achievement. SchoolEffectiveness and School Improvement, 27, 135–156. doi:10.1080/09243453.2015.1026268

Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performancefeedback: Perceptions of primary school principals. School Effectiveness and SchoolImprovement, 21, 167–188. doi:10.1080/09243450903396005

Wayman, J. C., Shaw, S., & Cho, V. (2011). Second-year results from an efficacy study of the Acuitydata system. Retrieved from http://www.waymandatause.com/wp-content/uploads/2013/11/Wayman_Shaw_and_Cho_Year_2_Acuity_report.pdf

Wayman, J. C., Stringfield, S., & Yakimowski, M. (2004). Software enabling school improvementthrough analysis of student data (Report No. 67). Retrieved from http://www.jhucsos.com/wp-content/uploads/2016/04/Report67.pdf

Wiliam, D., & Bartholomew, H. (2004). It’s not which school but which set you’re in that matters:The influence of ability grouping practices on student progress in mathematics. BritishEducational Research Journal, 30, 279–293. doi:10.1080/0141192042000195245

Appendix 1. Checklist instructional plan

A checklist was developed to measure the quality of DI as planned by teachers. This checklistincludes four topics (instruction, learning goals, evaluation, instruction for students with specificlearning needs). These topics cover 18 subtopics, which are measured by means of 43 items. Items

62 J. M. FABER ET AL.

Page 22: Differentiated instruction in a data-based decision-making …...tion data, or parent survey data, in order to represent aspects of school functioning (Schildkamp & Lai, 2013). In

were rated by the raters as “present” (1 score), or “not present” (0 score) in the instructional plan.In this Appendix, 18 subtopics of the checklist are presented. More information on this checklistand the translation of the other items can be obtained from the corresponding author.

1. Instruction

Ability-group composition

1.1. The composition of the ability groups is in line with the results of the analyses of students’assessments results, and students’ learning needs (2 items)

Description of instruction

1.2. The teacher(s) has/have described an instructional approach to address students’ learningneeds (3 items)

1.3. This instructional approach differs between ability groups (3 items)1.4. Students’ assignments, learning materials, and/or learning time differ between ability

groups (3 items)1.5. The teacher(s) has/have specified how assignments and/or learning materials will be used,

and for which students (3 items)

2. Learning goals2.1. The teacher(s) has/have formulated performance goals for each ability group (a minimum

percentage of correct answers for curriculum-based assessments (unstandardized assess-ments) (1 item)

2.2. The teacher(s) has/have formulated performance goals for each ability group (a minimumscore growth on a standardized assessment) (3 items)

3. Evaluation3.1 The teacher(s) has/have formulated how, and when the results of the instructional plan will be

evaluated (4 items)3.2 The teachers(s) has/have formulated teacher actions in case the learning goals will not be met

(1 item)4. Instruction for students with specific learning needs4.1. Teachers’ selection of students with specific learning needs is supported by the results of the

analyses of students’ assessment (curriculum-based, and/or standardized assessments), and/or the results of an observation instrument (3 items)

4.2. Teachers’ description of students’ learning needs is supported by the results of theanalyses of students’ assessment results (curriculum-based, and/or standardized assess-ments) (2 items)

4.3. Formulated learning goals are in line with the description of students’ learning needs (2items)

4.4. Learning goals are formulated in a SMART (specific, measurable, attainable, realistic, timely)way by the teacher(s) (3 items)

4.5. Summative assessment learning goals are formulated in a SMART way by the teacher(s)(3 items)

4.6. The teacher(s) has/have given a description of the specific instructional approach (1 item)4.7. The teacher(s) has/have given a description of the learning strategies students should learn (1

item)4.8. Learning materials are mentioned just as a description is given of how the learning materials

will be used by the teacher(s) (1 item)4.9. The teacher(s) has/have formulated the organization of instruction for students with specific

learning needs (4 items)

SCHOOL EFFECTIVENESS AND SCHOOL IMPROVEMENT 63