a “jarring” experience? exploring how changes to ......other metrics to measure teacher...

27
DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 1 A “Jarring” Experience? Exploring how Changes to Standardized Tests Impact Teacher Experience Effects Mark Chin Harvard Graduate School of Education Author Note Mark Chin, Harvard Graduate School of Education, Harvard University. Correspondence concerning this article should be addressed to Mark Chin, Center for Education Policy Research, 50 Church Street 4 th Floor, Cambridge, MA 02138. E-mail: [email protected]

Upload: others

Post on 08-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 1

A “Jarring” Experience?

Exploring how Changes to Standardized Tests Impact Teacher Experience Effects

Mark Chin

Harvard Graduate School of Education

Author Note

Mark Chin, Harvard Graduate School of Education, Harvard University. Correspondence concerning this article should be addressed to Mark Chin, Center for Education Policy Research, 50 Church Street 4th Floor, Cambridge, MA 02138. E-mail: [email protected]

Page 2: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 2

Abstract

Experience has long been used by states and districts to indicate a teacher’s quality. More time in

the classroom theoretically leads to the development of skills and the improved implementation

of instructional practices key to student learning. Empirical evidence links teacher experience to

student test performance. Test familiarity, and, subsequently, veteran teachers’ more effective

tailoring of instruction to the content and format of test items, may also contribute to this

relationship. If the teacher experience effect is in part explained by teacher test-experience, this

could lead to non-persistent student learning, and to misallocation of resources or misguided

personnel decisions. I used administrative data from Kentucky before and after the state switched

standardized tests to test whether test experience does factor into the teacher experience effect. I

found that the teacher experience effect on mathematics attenuated following the change,

supporting this hypothesis, as both novice and more veteran teachers became test-inexperienced.

Keywords: teacher quality, teacher experience, test preparation, standardized tests

Page 3: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 3

A “Jarring” Experience?

Exploring how Changes to Standardized Tests Impact Teacher Experience Effects

Experience has long been used by states and districts to indicate a teacher’s quality. As

such, teacher compensation contracts typically connect pay to years of experience (Schools and

Staffing Survey, 2012). Recent federal policies, however, have pushed policymakers to employ

other metrics to measure teacher effectiveness in their updated evaluation systems. One such

measure includes teacher impacts on students’ standardized test outcomes, or “value-added”

measures. Theory would suggest that students taught by more veteran teachers should

demonstrate higher test score growth. For example, the additional years of experience in the

classroom likely translate to better student outcomes through the development of key

proficiencies or the improved implementation of important instructional practices, such as better

understanding of student learning pathways or improved ability to minimize unproductive

classroom time (e.g., Leinhardt, 1989; Scribner & Akiba, 2010). This hypothesis has largely

played out in empirical analyses; most extant research suggests positive within-teacher returns to

experience, particular in the earlier years of a teacher’s time in the classroom (e.g., Harris &

Sass, 2011; Papay & Kraft, 2015; Rice, 2013; Rockoff, 2004).

Another explanation as to why students taught by novice teachers perform worse than

expected on standardized tests, however, may be the lack of familiarity such teachers have with

the format or content of exam items. Under this paradigm, higher test performance demonstrated

by students taught by more veteran teachers would in part be explained by these teachers’

improved implementation of narrower, test-specific instructional practices gained from

familiarity. Such a finding might provide further evidence that test-based accountability systems

do not necessarily yield lasting learning outcomes (e.g., Corcoran, Jennings, & Beveridge, 2011;

Page 4: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 4

Jacob, 2005), and, subsequently, would challenge the utility of teacher experience as an indicator

of quality or for decisions surrounding hiring, retention, and compensation. With many states

already having switched or contemplating a switch to use of standardized tests more aligned to

the Common Core (e.g., the Smarter Balanced assessments or PARCC), understanding how

testing regime changes impact this traditional indicator of teacher quality is essential.

In my analysis, I explored the possibility that the relationship between a student’s test

outcomes and the experience level of his or her teacher (henceforth referred to as “the teacher

experience effect” for simplicity) may in part be test-specific. To do so, I utilize administrative

data from the state of Kentucky, which changed its high-stakes standardized tests between the

2010-11 and 2011-12 academic years. If the teacher experience effect were to mainly capture the

development of test-independent teaching skills or improved implementation of other test-

independent instructional behaviors over time, I would expect the experience effect to be similar

to those observed in prior research, and to be consistent in the year before and after the change in

the standardized testing regime in the state. Alternatively, if the teacher experience effect were to

mainly capture familiarity with the states’ original standardized test, I would expect that the

difference in value added between novice and more veteran teachers to be consistent with prior

research before the change, but to attenuate in the year immediately following the change. This

change might thus capture the inexperience of both novice and veteran teachers in implementing

narrow, effective test-specific instruction the year following a change.

Results from standard value-added models controlling for the teacher experience effect

provided evidence that would support the latter hypothesis. I found that this effect on

mathematics achievement in Kentucky significantly attenuated the year following the

standardized test change. This difference was particularly pronounced for the sample of schools

Page 5: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 5

with the highest proportion of students eligible for free- or reduced-price lunch in the year prior

to the test change, as expected, given prior research documenting higher prevalence of test-

focused instruction in schools serving the most disadvantaged students. The difference observed

in the main analysis remained even when restricting the sample of teachers to those teaching in

both school years (i.e., to account for potential attrition of ineffective teachers from the sample).

Furthermore, analyses suggested that new teachers hired in the years before and after the test

change were not significantly different from one another on other characteristics that would

indicate their effectiveness, alleviating some concern regarding the influence of “vintage effects”

(see Murnane & Phillips, 1981) on my results. I found similar patterns for the teacher experience

effect on student English language arts (ELA) achievement, though the effects overall were

(unsurprisingly) smaller and differences were insignificant.

These initial findings suggest that the positive returns to experience for student test

outcomes observed in prior literature may in part result from increased exposure and familiarity

with standardized tests; if such teachers more capably implement test-specific teaching practices

to improve outcomes, the effect may result in impacts on student learning that fail to persist, or

“fade out”, over time (e.g., Kane & Staiger, 2008). Basing personnel decisions on teacher

experience may thus be misguided if persistent student learning is the goal of such policies.

In what follows, I describe the setting for my investigation, the data used in my analyses,

and my methodology for exploring the teacher experience effect on student test achievement. I

then present results from analyses, and conclude by discussing the practical implications of my

findings.

Setting

Page 6: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 6

In the March of 2009, the governor of Kentucky signed Senate Bill 1, which called for a

comprehensive revision to the states’ academic standards for student learning in several subjects.

Revised standards for learning were intended to be tougher than original standards, with such

revisions targeting increased college and career readiness of the states’ students; Kentucky

officially adopted the Common Core State Standards (CCSS) in 2010. Senate Bill 1 also

mandated that new high-stakes standardized tests be designed to reflect and align with the

changes to academic standards, and for administration of these tests to begin starting in the

academic year of 2011-12. Prior to this transition year, and starting in the 1998-99 academic

year, students in grades three to eight were tested using the Kentucky Core Content Tests

(KCCT). The state began administration in 2011-12 of the Kentucky Performance Report for

Educational Progress (K-PREP) tests. Though the bill suspended state school accountability

based on student KCCT performance starting in the 2008-09 academic year and through the first

administration of the K-PREP tests, to meet federal regulations stemming from No Child Left

Behind, the KCCT was still administered through the 2010-11 academic year. Notably, the

Kentucky Department of Education did not develop new items designed to assess students on the

standards tied to the KCCT, and the KCCT administered in 2010-11 included the same items as

the one administered in 2008-09 (Bynum & Thacker, 2011).

Data

In my analyses, I used Kentucky statewide student- and teacher-level data collected in the

2010-11 and 2011-12 academic years. Student-level data analyzed included: (a) student

demographic data, including gender, race or ethnicity, eligibility for free- or reduced-price lunch

(FRPL), eligibility for special education (SPED), a designation for limited English proficiency,

and other academic-level indicators (i.e., classification as “gifted”, being retained in a grade, or

Page 7: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 7

receiving supplemental instruction in mathematics or ELA); (b) current and prior scaled score

performance on either the KCCT or K-PREP mathematics and ELA state standardized tests, and;

(c) student links to teachers, classrooms, schools, and districts. For my analyses, I focused on

students in grades four through eight, as students in third grade did not have prior test scores—an

important covariate in my models exploring the teacher experience effect. Teacher-level data

analyzed included a variable for in-state teaching experience and whether or not teacher held an

advanced degree within a given academic year.

To ensure the stability of results and avoid the misattribution of effects when estimating

the teacher experience effect (described in more detail below), I restricted the sample of students.

Specifically, students had to: (a) be reliably linked to a single teacher for primary mathematics

(or ELA) instruction (i.e., the course of instruction fit the typical course progression, or only one

link existed); (b) have data on all controls included in my analysis models, and; (c) not be linked

to an atypical classroom (i.e., those containing fewer than five students, greater than 40 students,

greater than 50% of students missing prior achievement scores, or greater than 50% of students

being SPED). Following these restrictions, my final sample for mathematics contained 282833

students taught by 4517 teachers in 984 Kentucky schools across the two years. For ELA, this

final sample included 304770 students taught by 5239 teachers in 991 Kentucky schools across

the two years.

Methods

To explore whether the teacher experience effect on student test performance in

Kentucky changed after the state’s move from the KCCTs to the K-PREP tests, I estimated the

following student-level model (Equation 1) using OLS regression, clustering standard errors at

the school level:

Page 8: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 8

𝑦𝑖𝑖𝑖𝑖𝑖𝑖 =

𝑌𝑖𝑖−1𝛼 + 𝐷𝑖𝑖𝛽 + 𝑃𝑖𝑖𝑖𝑖𝑖𝛿 + 𝐶𝑖𝑖𝑖𝛾 + 𝜅𝑖𝑖 + 𝜂𝑖 + 𝑓(EXP𝑖2010−11)𝜇 + 𝑓(EXP𝑖2011−12)𝜈 +

𝜀𝑖𝑖𝑖𝑖𝑖𝑖

The outcome in Equation 1, 𝑦𝑖𝑖𝑖𝑖𝑖𝑖, captures the performance of student i in class j taught by

teacher k in grade g in school s in the academic year t on either the mathematics or ELA test

from the KCCT or the K-PREP assessment.0F

1 This performance (i.e., the student’s scaled score)

was standardized within grade and year to have a mean of zero and standard deviation of one.

The model controlled for a vector of controls for student baseline ability levels (𝑌𝑖𝑖−1), including

a cubic function for prior test achievement; a vector of the student demographic characteristics

described above (𝐷𝑖𝑖); the aggregate of the two covariate vectors for a student’s classroom peers

(𝑃𝑖𝑖𝑖𝑖𝑖); the aggregate of the two covariate vectors for a student’s grade-level cohort (𝐶𝑖𝑖𝑖); and

grade-by-year fixed effects (𝜅𝑖𝑖).

My coefficients of interest in Equation 1 are the effects (modeled using different

functional forms described below) of being taught by a teacher k with experience EXP on the

outcome in different years, captured by 𝜇 for 2010-11 (i.e., the teacher experience effect on

KCCT performance) and by 𝜈 for 2011-12 (i.e., the teacher experience effect on K-PREP

performance). If the teacher experience effect on outcomes is largely independent of the specific

test, I would not expect the difference between the coefficients 𝜇 and 𝜈 to be statistically

significant. This result would contradict the theory that the positive returns to teacher experience

on outcomes seen in extant literature might be caused by test familiarity (and, subsequently,

1 Exploration into the distributions of scaled scores for students on standardized tests in 2010-11 showed a significant ceiling effect and a minor floor effect. To ensure that my results were not influenced by the loss of information regarding students’ actual ability levels at these extremes, I dropped students attaining the highest or lowest possible scale score on all tests from my analyses. Sensitivity checks suggested that inclusion of these students had minor impacts on estimates, with overall patterns remaining the same.

Page 9: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 9

ability to teach to the test well) of more veteran teachers. However, if the effect of experience is

in part explained by the test familiarity of more veteran teachers, as I would hypothesize, I

expect the difference between the two coefficients to be statistically significant. Specifically, I

would expect the effect of additional experience to attenuate in 2011-12, as both novice and

more veteran teachers would be equally (un)familiar with the new K-PREP assessments.

Key to my hypothesis is that the KCCT and the K-PREP assessments are sufficiently

different from one another as to cause a “drop” in test familiarity for more veteran teachers

following the standardized test change. Exploration into the alignment of items on the KCCT to

the CCSS (which were aligned to the K-PREP) suggested that, though items on the old

standardized tests did assess many of the new adopted standards, gaps still existed in terms of the

content and the depth of knowledge assessed (Taylor, Thacker, Koger, Koger, & Dickinson,

2010). Anecdotal evidence also indicated the K-PREP to be more difficult than the KCCT, which

was supported empirically by observations that far fewer students scored the highest possible

scaled score on the new exams.

Another key to my hypothesis is that evidence exists documenting the implementation of

narrow, test-specific instruction by teachers during the time period that the KCCT was

administered. Though anecdotal report does provide such evidence, some quantitative analyses

into the relationship between changes in school-level test performance across years with school-

level averages of FRPL in Kentucky have also supported this notion. Specifically, this

relationship, insignificant before the switch to the K-PREP, is negative and significant (i.e.,

schools with higher proportions of disadvantaged students worsen in their average performance

across years) in 2011-12 (Dickinson, Levinson, & Thacker, 2013). Though several reasons might

explain this observed relationship, extant research has found higher incidence of narrow, test-

Page 10: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 10

specific instructional practices in schools with larger proportions of disadvantaged students (e.g.,

Herman & Golan, 1993). Thus, it stands that such schools would be “hurt” most in terms of

student achievement by the switch from the KCCT to the K-PREP assessments. These findings

informed supplemental analyses (described below) exploring how differences between the

teacher experience effect in 2010-11 and teacher experience effect in 2011-12 varied across

different types of schools.

A notable inclusion in the model represented in Equation 1 is the control for school fixed

effects (𝜂𝑖). Though the appropriate specification for modeling teacher effects on student

achievement is still being debated (see Goldhaber & Theobald, 2012), I opt to include school

fixed effects, as other researchers investigating returns to teacher experience have done in the

past (e.g., Papay & Kraft, 2015), and because prior research has provided evidence for the

systematic sorting of teachers—in particular, inexperienced ones—to certain types of schools

(see Rice, 2013). Further, other literature has documented heterogeneous effects of experience

across schools (Kraft & Papay, 2014; Loeb, Kalogrides, & Béteille, 2013; Sass, Hannaway, Xu,

Figlio, & Feng, 2012).

A notable exclusion in the model represented in Equation 1 is a control for teacher fixed

effects. Many researchers interested in exploring the effect on student outcomes of being taught

by teachers with varying experience levels have (rightfully) noted that cross-sectional analyses

fail to account for certain effect-biasing factors. Specifically, cross-sectional investigations fail to

account for selection biases and vintage effects (Murnane & Phillips, 1981). The former factor

suggests that more experienced teachers may yield larger gains for students on test outcomes

because the least effective teachers leave the teaching profession altogether. The latter factor

argues that teachers from different hired cohorts vary in their latent effectiveness such that the

Page 11: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 11

effect of experience is confounded. Though several estimation methods exist that attempt to

address these issues (see Papay & Kraft, 2015), researchers most commonly include teacher

fixed effects into their models investigating the teacher experience effect, essentially comparing

more experienced teachers to their less-experienced selves. Inclusion of teacher fixed effects into

Equation 1, however, would result an unidentifiable model, due to the extant inclusion of both

grade-by-year fixed effects, to account for year-to-year “shocks” to test performance, and my

variables of interest—experience-by-year interactions. Thus, I opted to exclude teacher fixed

effects from my model. In attempts to alleviate some concern around selection bias and vintage

effects influencing my results, I conducted sensitivity analyses. First, as I am interested not

necessarily in the underlying trajectory of returns to teacher experience but am instead interested

in whether the teacher experience effect differs between academic years, I restricted the sample

of students to those only taught by teachers teaching students in both 2010-11 and 2011-12.1F

2

Second, I explored whether the novice teachers in the 2010-11 and 2011-12 cohorts might have

differed significantly in other measures of teacher quality. Specifically, I explored whether

teachers in one year were more likely to hold an advanced degree than those in the other year. I

also used data from the Common Core of Data to estimate Equation 1 controlling for changes in

enrollment from the prior year in 2010-11 and 2011-12 within each district (see Murnane &

Phillips, 1981, for a similar analysis). By doing so, I explored whether or not the quality of

incoming teachers, proxied by changes in enrollment (i.e., in years with larger increases in

enrollment, the demand for teachers will increase such as to reduce the overall quality of newly

hired teachers), might have explained my results.

Results

2 I include students taught by novice teachers in 2011-12, as such teachers did not instruct in the prior year.

Page 12: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 12

First, in order to provide a sense of the distribution of experience of teachers across years

in Kentucky, I show histograms of this measure in 2010-11 and 2011-12. As Figure 1 shows, a

large proportion of teachers have fewer than 10 years of in-state teaching experience, and

districts did hire a significant number of novice teachers in both years.

[Insert Figure 1 here.]

The distribution of teacher experience informed my categorization of teachers into different

“buckets” of experience for initial model estimation of Equation 1 for mathematics and ELA

student outcomes. Specifically, I compared the performance of students taught by novice

teachers to those taught by teachers of each additional year of experience up to nine (i.e., I

included dummy indicators for each year of experience from one to nine) and to those taught by

teachers with 10 or more years of experience. I argue for this model simplification because, in

many studies, very experienced teachers (who are also not the main focus of my exploration) do

not demonstrate significantly larger effects on achievement than those with a few years of

experience (see Kraft & Papay, 2014, for a similar simplification). Furthermore, as noted early,

administration of the KCCT began in the 1998-99 academic year, which would suggest that very

experienced teachers might not be that different from relatively experienced teachers in “test-

specific experience”, the key variable in my hypothesis.

Mathematics

Figure 2 presents a visual representation of 𝜇 and 𝜈 from estimation of Equation 1 for

student performance on Kentucky’s mathematics standardized tests for each individual year of

experience up to 10-plus years.

[Insert Figure 2 here.]

Page 13: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 13

The two figures captured by my analyses depict a few patterns. First, the effect on KCCT

mathematics performance (i.e., the solid line) of being taught by non-novice teachers is positive;

in fact, the actual “trajectory” of effects as teacher experience increases through 10-plus years is

very similar to the trajectory seen in Papay and Kraft (2015)—despite the authors specifically

modeling of within-teacher returns to experience. However, this trajectory is much flatter in the

year following Kentucky’s switch to the K-PREP tests.

I used these initial exploratory results to further refine the controls for teacher experience

in my model formally testing differences in the experience effect across years. Specifically, I

employed only indicators (interacted with year) for being taught by a teacher with one year of

experience and for being taught by a teacher with two-plus years of experience. I reduced the

number of experience controls in estimating Equation 1 to reduce the number of comparisons of

effects across years made, and because Figure 2 indicated that, following positive increases in

effects for the first two years of experience, the cross-sectional “returns” to additional experience

were relatively stable. Table 1 shows the results from my regression estimates with these specific

controls for experience.

[Insert Table 1 here.]

Column 1 in Table 1 shows the results from my base analyses. Being taught by a teacher

with one year of experience in 2011, as opposed to a novice teacher in that same year, is

associated with a 0.09 SD (p<0.01) relative increase in KCCT mathematics achievement.

Students who were taught by teachers with two or more years of experience this year

experienced a relative growth of 0.12 SD (p<0.01) more than their peers taught by a novice

teacher. Comparatively, in 2012, the first year of K-PREP administration, the effect of being

taught by a teacher with one year of experience over a novice teacher was smaller and

Page 14: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 14

insignificant (𝛽 = 0.00, p>0.1). The effect of being taught by a teacher with two or more years

of experience over a novice teacher, though still significant, was smaller as well (𝛽 = 0.05,

p<0.05). Importantly, these differences across years in the effect of experience were statistically

significant (see the F-test rows in Column 1)—the pattern of attenuation matched my hypothesis.

I investigated whether differences in the teacher experience effect across years might

vary depending on school-level averages for student FRPL. Prior research has found in some

settings that schools with larger proportions of such students demonstrate higher incidence of

narrow, test-specific instructional practices. Thusly, I hypothesized that schools with lower

FRPL-rates would witness smaller differences between the teacher experience effects across

years, as teachers in such schools would be less likely to enact the performance-impactful

practices affected most by a test change. Similarly, I hypothesized that schools with higher

FRPL-rates would witness larger differences. Columns 2 and 3 of Table 1 depict the results from

an analysis. In column 2, I looked at the teacher experience effect across years for schools in the

bottom tercile of FRPL-rates (i.e., between two- to 65-percent of students in a school being

eligible for FRPL in 2010-11), and indeed saw smaller differences across years. In column 3, I

looked at the teacher experience effect across years for schools in the top tercile of FRPL-rates

(i.e., 80-percent or more of students in a school being eligible for FRPL in 2010-11), and again

found my hypothesis to be supported. Though standard errors on estimates have increased due to

the restricted sample in both analyses, affecting comparison tests, the patterns and magnitudes of

the observed effects matched what I expected.

Sensitivity analyses. As noted above, the literature investigating the teacher experience

effect on student outcomes has raised several concerns regarding bias in estimated effects—

specifically, the bias caused by selection (of teachers out of the teaching profession) and vintage

Page 15: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 15

effects. As such, I conducted two sets of sensitivity analyses that try to address the issues raised

by prior work investigating the teacher experience effect on student outcomes.

First, I explored the effects of being taught by a teacher with one or two-plus years of

experience (versus no experience) on mathematics achievement for the sample of students taught

by teachers that remained in the sample for both years or were novices in 2011-12. This

restriction should help alleviate the worry that departure of less-effective teachers from the

sample between the 2010-11 and 2011-12 school years may have biased my results. In column 4

of Table 1, however, we see that this sample restriction does not impact the substantive

conclusion of my analyses.

Second, I followed Murnane and Phillips (1981) and attempt to account for vintage

effects by controlling for changes in student enrollment in my model. As noted earlier, this

control proxies for teacher effectiveness, as the expectation is that years with larger increases in

enrollment will increase the demand for teachers, and subsequently decrease the overall quality

of newly hired teachers. The quality of novice teachers might be contributing to the effects I

observed; for example, if new hires in 2011-12 were particularly strong relative to those in 2010-

11, this could also attenuate the relationship between increased experience and test scores

following the test change. However, inclusion of the control for changes in enrollment again did

not impact the substantive interpretation of my results (see column 5 of Table 1). Furthermore, a

paired t-test looking at whether or not novice teachers in 2011-12 were more likely to hold

advanced degrees than novice teachers—another often used indicator for teacher quality in

school districts, also tied to teacher salary—in 2010-11 did not find significant differences

(results not shown). I acknowledge the limitation of using just these observable characteristics to

justify the equivalence of the quality of new hires across school years; however, I argue that

Page 16: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 16

these sensitivity analyses at least yield some reassurances that vintage effects may not have

caused my main findings to emerge.

ELA

I followed the same analytic strategy depicted in my analyses of the effect of teacher

experience on students’ mathematics test outcomes for investigations into the effect of teacher

experience on students’ ELA test outcomes. Figure 3 depicts the results of my initial exploratory

estimation of Equation 1.

[Insert Figure 3 here.]

Compared to what was suggested regarding the teacher experience effect depicted for

mathematics in Figure 2, the effect depicted for ELA in Figure 3 suggests that the effect of

teacher experience on student ELA test performance is much smaller. The magnitude of these

results, however, matched those seen in prior work. Furthermore, the figure does show a flatter

trajectory of effects for experience in the year following Kentucky’s standardized test regime

change. When modeling the relationships formally (see Table 2), however, I was unable to reject

the null hypothesis that the coefficients for experience on ELA outcomes were the same across

years, despite the patterns across years arguably being similar to those seen in mathematics.

[Insert Table 2 here.]

Discussion

Teacher experience has been used for decades as an indicator for teacher effectiveness in

school districts across the country. Expectations are that increased time in the classroom allows

teachers to develop skills and refine instructional practices that increase their positive impact on

student learning. This hypothesis has been largely borne out in empirical studies linking teacher

experience and student test performance, one indicator for student learning. Perhaps

Page 17: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 17

unsurprisingly, contracts typically tie teacher salaries to their experience and districts often base

personnel decisions on the time a teacher has been in the classroom.

The utility of experience as a proxy for a teacher’s effectiveness in impacting student

persistent growth, however, depends in part on whether the skills and abilities gained with

experience are test-independent. Numerous studies have documented the narrowing of teachers’

instruction to focus on exam-specific item formats and content when high-stakes are attached to

student performance on standardized tests. Thus, the observed teacher experience effect might in

reality reflect the development of effective test-specific instructional behaviors gained from

additional experience with standardized tests. This hypothesis could help explain the surprising

fade out of teacher impacts over time.

A switch from the KCCTs to the K-PREP assessments in Kentucky allowed me to

investigate the possibility that the effect of teacher experience on student outcomes might in part

be explained by test familiarity. I found that students taught by teachers with more experience

saw more growth in mathematics in the year before the standardized test switch than those taught

by teachers with the same amount of experience in the year after. Furthermore, the difference

was more pronounced in schools that served more disadvantaged populations; test-focused

instruction has been more documented as more prevalent in such schools. These finding support

the hypothesis of a teacher test-experience effect contributing to the overall teacher experience

effect, as novices and more veteran teachers are both inexperienced with test item formats and

content following a testing change.

Two points about the Kentucky’s switch from the KCCT to the K-PREP assessments are

necessary to put my findings in context. First, the items used in the final administration of the

KCCT (in the 2010-11 academic year) were the same as those used in the 2008-09

Page 18: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 18

administration. This repetition in particular benefits those with more experience with the formats

and content of state test items, and results in even larger differentials between novice and more

veteran teachers in terms of test performance impacts; thus, the difference I observed between

teacher experience effects across years may be an upper bound. On the other hand, the state had

also suspended school test-based accountability in that final administration and officially adopted

the CCSS curriculum two years before the test switch. Had schools and teachers been under

typical accountability pressures, the state could have witnessed an even greater narrowing of

instruction, and the novice-veteran differential would have been larger. It is likely both of these

contextual factors influence my estimates for the teacher experience effect on mathematics in

Kentucky.

What do my findings mean for policy? Teachers’ salaries in Kentucky, like the salaries of

those in other states, are tied in part to experience. If compensation is partly tied to experience

because of its expected relationship to student test outcomes, states and districts will need to

consider whether or not the potential test-specific nature of the teacher experience effect should

be rewarded, and to what extent. For example, I observed the difference in student test score

impacts between novice teachers and teachers with two years of experience in Kentucky to be

approximately 0.10 standard deviations in 2010-11, but only 0.05 standard deviations in 2011-12

(see Figure 2); policymakers may consider basing compensation policies on the latter

differential.

Replication in other settings is essential to corroborate my work and to help develop a

better sense of how much of the teacher experience effect is captured by teacher test-experience.

With many states now considering or already having recently moved from their original

standardized tests to those more aligned to the CCSS, replication should be a much more

Page 19: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 19

practical endeavor. Analyses investigating the relationship of experience to other measures of

teacher quality can also help support my hypothesis; for example, even if the teacher experience

effect is in part test-specific, I would still expect differentials in terms of impact on student non-

test-score outcomes between novice and more veteran teachers to remain following a test change.

If experience yields positive effects on these outcomes and these effects persist, it may be that

states and districts should continue to reward teachers based on their experience.

Page 20: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 20

References

Bynum, B. H., & Thacker, A. A. (2011). Third-party checking of calibration, scaling and

equating of the 2011 Kentucky Core Content Test. Alexandria, VA: Human Resources

Research Organization. Retrieved from

http://education.ky.gov/AA/KTS/Documents/FR-11-65%20KCCT%20Third-

Party%20Checking%202011.pdf

Corcoran, S. P., Jennings, J. L., & Beveridge, A. A. (2011). Teacher effectiveness on high-and

low-stakes tests. Paper presented at the meeting of the Society for Research on

Educational Effectiveness, Washington, DC.

Dickinson, E. R., Levinson, H., & Thacker, A. A. (2013). Exploring patterns in school

achievement from KCCT to K-PREP: The role of school-level socioeconomic status.

Alexandria, VA: Human Resources Research Organization. Retrieved from

http://education.ky.gov/aa/kts/documents/humrro%202013-036%20kcct%20to%20k-

prep%20school%20level%20ses.pdf

Goldhaber, D., & Theobald, R. (2012). Do different value-added models tell us the same

things? Center for Education Data & Research: Seattle, WA.

Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality and student

achievement. Journal of Public Economics, 95(7), 798-812.

Herman, J. L., & Golan, S. (1993). The effects of standardized testing on teaching and

schools. Educational Measurement: Issues and Practice, 12(4), 20-25.

Jacob, B. A. (2005). Accountability, incentives and behavior: The impact of high-stakes testing

in the Chicago Public Schools. Journal of Public Economics, 89, 761-796.

Page 21: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 21

Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An

experimental evaluation (No. w14607). National Bureau of Economic Research:

Cambridge, MA.

Kraft, M. A., & Papay, J. P. (2014). Can professional environments in schools promote teacher

development? Explaining heterogeneity in returns to teaching experience. Educational

Evaluation and Policy Analysis, 36(4), 476-500.

Leinhardt, G. (1989). Math lessons: A contrast of novice and expert competence. Journal for

Research in Mathematics Education, 20(1), 52-75.

Loeb, S., Kalogrides, D., & Béteille, T. (2012). Effective schools: Teacher hiring, assignment,

development, and retention. Education Finance and Policy, 7(3), 269-304.

Papay, J. P., & Kraft, M. A. (2015). Productivity returns to experience in the teacher labor

market: Methodological challenges and new evidence on long-term career improvement.

Journal of Public Economics.

Murnane, R. J., & Phillips, B. R. (1981). Learning by doing, vintage, and selection: Three pieces

of the puzzle relating teaching experience and teaching performance. Economics of

Education Review, 1(4), 453-465.

Rice, J. K. (2013). Learning from experience? Evidence on the impact and distribution of teacher

experience and the implications for teacher policy. Education Finance and Policy, 8(3),

332-348.

Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from

panel data. American Economic Review, 247-252.

Page 22: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 22

Sass, T. R., Hannaway, J., Xu, Z., Figlio, D. N., & Feng, L. (2012). Value added of teachers in

high-poverty schools and lower poverty schools. Journal of Urban Economics, 72(2),

104-122.

Schools and Staffing Survey (SASS), National Center for Education Statistics, Institute of

Education Sciences, U.S. Department of Education. (2012). Table 2. Percentage of public

school districts that had salary schedules for teachers and among those that had salary

schedules, the average yearly teacher base salary, by various levels of degrees and

experience and state: 2011–12 [Data set]. Retrieved from

https://nces.ed.gov/surveys/sass/tables/sass1112_2013311_d1s_002.asp

Scribner, J. P., & Akiba, M. (2010). Exploring the relationship between prior career experience

and instructional quality among mathematics and science teachers in alternative teacher

certification programs. Educational Policy, 24(4), 602-607.

Taylor, L. R., Thacker, A. A., Koger, L. E., Koger, M. E., & Dickinson, E. (2010). Alignment of

the Kentucky Core Content Test (KCCT) items to the Common Core State Standards.

Alexandria, VA: Human Resources Research Organization. Retrieved from

http://education.ky.gov/AA/KTS/Documents/FR-10-

36%20KCCT%20Common%20Core%20Alignment%20to%20State%20Standards.pdf

Page 23: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 23

Figure 1. Distribution of teacher experience in 2011 and 2012 in Kentucky.

Page 24: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 24

Figure 2. Plotted regression coefficients of teachers’ experience on student mathematics achievement growth. Effect of teachers with more than 10 years of experience collapsed into the 10-year category.

0.0

5.1

.15

.2S

tude

nt A

chie

vem

ent

0 2 4 6 8 10Experience

2011 2012

Page 25: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 25

Table 1. Regression coefficients for teachers’ experience on students’ mathematics achievement growth 1 2 3 4 5 1-year Exp. Effect in 2011 0.0856*** 0.0739** 0.119** 0.0840*** 0.0881***

(0.0227) (0.0351) (0.0491) (0.0255) (0.0234)

1-year Exp. Effect in 2012 -0.000827 0.0282 0.0231 0.00621 -0.00163

(0.0251) (0.0403) (0.0584) (0.0298) (0.0252)

2-plus-years Exp. Effect in 2011 0.120*** 0.0863*** 0.111*** 0.120*** 0.120***

(0.0170) (0.0190) (0.0332) (0.0195) (0.0174)

2-plus-years Exp. Effect in 2012 0.0495** 0.0609** 0.0159 0.0445** 0.0514**

(0.0208) (0.0287) (0.0482) (0.0218) (0.0207)

Controls Student Demographics x x x x x

Prior Achievement x x x x x Grade-by-year Fixed Effects x x x x x Cohort Aggregates x x x x x Peer Aggregates x x x x x School Fixed Effects x x x x x

Sensitivity Checks Low-FRPL Schools

x High-FRPL Schools

x

Selection Sample

x Control for Change in Enrollment

x

from Prior Year

Observations 282833 125847 58362 224451 280223 R-squared 0.510 0.518 0.429 0.516 0.510

F-tests p-value: 2011 1-year Exp. Effect 0.0171 0.419 0.224 0.0705 0.0148

vs. 2012 1-year Exp. Effect p-value: 2011 2-plus-years Exp. Effect 0.00948 0.466 0.0795 0.00893 0.0121

vs. 2012 2-plus-years Exp. Effect

Note: School-level clustered standard errors reported in parentheses. The Selection Sample includes only students taught by teachers who teach in both 2011 and 2012 (or are novices in 2012). ***p<0.01, **p<0.05, *p<0.1

Page 26: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 26

Figure 3. Plotted regression coefficients of teachers’ experience on student ELA achievement growth. Effect of teachers with more than 10 years of experience collapsed into the 10-year category.

0.0

5.1

.15

.2S

tude

nt A

chie

vem

ent

0 2 4 6 8 10Experience

2011 2012

Page 27: A “Jarring” Experience? Exploring how Changes to ......other metrics to measure teacher effectiveness in their updated evaluation systems. One such measure includes teacher impacts

DRAFT – PLEASE DO NOT CITE OR DISTRIBUTE WITHOUT CONSENT 27

Table 2. Regression coefficients for teachers’ experience on students’ ELA achievement growth

1-year Exp. Effect in 2011 0.0473*

(0.0248)

1-year Exp. Effect in 2012 0.0308*

(0.0173)

2-year Exp. Effect in 2011 0.0152

(0.0238)

2-year Exp. Effect in 2012 0.0319*

(0.0179)

3-year Exp. Effect in 2011 0.0208

(0.0230)

3-year Exp. Effect in 2012 0.0104

(0.0180)

4-plus-years Exp. Effect in 2011 0.0525***

(0.0164)

4-plus-years Exp. Effect in 2012 0.0271**

(0.0134)

Controls Student Demographics x

Prior Achievement x Grade-by-year Fixed Effects x Cohort Aggregates x Peer Aggregates x School Fixed Effects x

Observations 304770 R-squared 0.492

F-tests p-value: 2011 1-year Exp. Effect vs. 2012 1-year Exp. Effect 0.578

p-value: 2011 2-year Exp. Effect vs. 2012 2-year Exp. Effect 0.576 p-value: 2011 3-year Exp. Effect vs. 2012 3-year Exp. Effect 0.712 p-value: 2011 4-plus-years Exp. Effect vs. 2012 4-plus-years Exp. Effect 0.222

Note: School-level clustered standard errors reported in parentheses. ***p<0.01, **p<0.05, *p<0.1