the long-run effects of teacher cheating on student outcomes

30
The LongRun Effects of Teacher Cheating on Student Outcomes A Report for the Atlanta Public Schools Tim R. Sass, Distinguished University Professor, Georgia State University Jarod Apperson, Ph.D. student, Georgia State University Carycruz Bueno, Ph.D. student, Georgia State University May 5, 2015

Upload: maxblau

Post on 27-Sep-2015

703 views

Category:

Documents


9 download

DESCRIPTION

The Long-Run Effects of Teacher Cheating on Student Outcomes. - GSU study for APS on cheating scandal impacts.

TRANSCRIPT

  • TheLongRunEffectsofTeacherCheatingonStudentOutcomes

    AReportfortheAtlantaPublicSchools

    TimR.Sass,DistinguishedUniversityProfessor,GeorgiaStateUniversityJarodApperson,Ph.D.student,GeorgiaStateUniversityCarycruzBueno,Ph.D.student,GeorgiaStateUniversity

    May5,2015

  • ExecutiveSummaryThemanipulationofstudent testscoresby teachersandadministrators in theAtlanta

    PublicSchools(APS)hasbeencarefullydocumentedanddozensofteachersand leadershaveeither accepted administrative sanctions, plead guilty to criminal charges or have beenconvictedof crimes. Little is known,however, abouthow the falsificationof test scoresbyteachers and administrators (teacher cheating) has impacted students. Using a panel ofindividualleveldataonstudentsandteachersfromAPS,weinvestigatedtheeffectsofteachercheatingonsubsequentstudentachievement,attendanceandstudentbehavior. Keyfindingsinclude: Ofthe11,553studentswhowereinclassroomsflaggedbytheGovernorsOfficeofStudent

    Achievement (GOSA) for high levels ofwrongtoright erasures on the 2009 CRCT exam,5,888(51percent)werestillenrolledinAPSinfall2014.

    Notallstudents inclassroomswherecheatingoccurredhadtheirtestscoresmanipulatedequally. Forboth readingandEnglish LanguageArts (ELA),overonethirdof students inflaggedclassroomshadtwoorfewerwrongtoright(WTR)erasuresandnearlyonefourthofstudentsinflaggedclassroomshadtwoorfewerWTRerasuresontheirmathCRCTexam.

    Thereisstrongevidencethatteacherswereselectiveintheirmanipulationoftestscores.Inparticular, teachersweremore likely to change answers for students of lower apparentability.

    Usingthefrequencyoferasuresin2013asabenchmark,ofthe11,553studentsinflaggedclassrooms in2008/09weestimate that7,064 (61percent) likelyhad their test answersmanipulated inoneormore subjectson the2009CRCTexam. Of these7,064 students,3,728werestillenrolledinAPSinfall2014.

    Thereisrelativelyrobustevidencethatmanipulationofstudentstestanswershadnegativeconsequences for later studentperformance in reading and English LanguageArts (ELA).Theestimated impactsareequivalent toaonetime reduction inachievementof roughlyonefourthtoonehalfoftheaverageannualachievementgainformiddleschoolstudents.Put differently, the estimated loss in achievement is one to two times the difference inachievement between having a rookie teacher rather than a teacherwith five years ofexperience for a single year. In contrast, the effects of teacher cheating for subsequentachievementinmatharemixed.

    There is littleornoevidencethatteachercheatinghaddeleteriouseffectsonsubsequentstudentattendanceorstudentbehavior.

  • 1

    1. IntroductionMuch attention has been paid to the teachers and administrators caught up in the

    Atlanta Public Schools (APS) cheating scandal. However, little is known about how thefalsificationofstudenttestscoreshasimpactedstudents.Whileillicitbehaviorbyteachersandadministrators (henceforth teacher cheating) obviously boosted student test scores in theshort term, it isunknownwhateffect thischeatinghadon subsequent studentachievementandotheroutcomessuchasattendance,disciplineandhighschoolcompletion. Inthisreportwe investigatetheeffectsoftestscoremanipulationonstudentoutcomesto learntheextentto which studentsmay have been harmed in the longrun. We focus on the relationshipbetweenteachercheatingandpostcheatingstudenttestscores,attendanceandbehavior.Inadditiontomeasuringtheimpactofteachercheatingonstudentoutcomes,weseektoquantifythenumberofstudentspotentiallyaffectedbyteachercheatinganddetermineiftheyarestillenrolledwithinAPS.

    Specifically,thisreportaddressesthefollowingquestions:1. HowmanystudentswerepotentiallyimpactedbyteachercheatinginAPS?2. Amongpotentiallyimpactedstudents,whatistheeffectofteachercheatingontestscore

    outcomesinthelongrunanddoesitvaryacrossstudentsandacrosssubjects?3. Werestudentswhoreceivedinflatedtestscoreslesslikelytoattendschoolafterthey

    learnedthattheirpriorscoreswereincorrect?4. Howhasthedisciplinarybehaviorofpotentiallyimpactedstudentschangedsinceteacher

    cheatingwasuncovered?5. HowmanystudentswhowerepotentiallyimpactedarestillenrolledinAPS?6. Whatproportionofstudentswhowerepotentiallyimpactedtransferredfromtraditionalto

    charterschoolsorlefttheAtlantaPublicSchoolsystem?

    2. PossibleMechanismsandRelatedResearchTheexisting research related to teacher cheatinghas focusedon the identificationof

    cheating (JacobandLevitt,2003a,2003b;vanderLindenandJeon,2012;KingstonandClark,2014).Thereisnopriorresearchthatspecificallyconsidersthepossibleimpactsofalteredtestscores on subsequent student performance. However, there are four related strands ofliterature, student selfesteem, grade inflation, teacher incentives and achievementbasedinterventions,eachofwhichprovides somepossiblemechanismsbywhich teacher cheatingcouldimpactstudentsinthelongrun.

  • 2

    A.SelfesteemIfa student receivesanunexpectedlyhigh test score it is conceivable that this could

    increase their own perception of their academic ability and consequently boost their selfesteem.Ofcourse,ifitislaterlearnedthatthepriortestscorewasfalse,anyincreaseinselfesteem could be eliminated or even reversed. A number of studies have shown that selfesteemmayhavepositiveeffectsonbothacademicandemploymentoutcomes.Forexample,Waddell (2006) finds that high school graduateswith low selfesteem attain fewer years ofpostsecondary education, aremore likely to be unemployed and have lower earnings thanothers in their high school graduation cohort. de Araujo and Lagos (2013) consider thesimultaneousdeterminationofselfesteem,educationalattainmentandearningsandfindthatpositive selfesteem boosts wages by increasing educational attainment, but has no directeffectonwages.

    Inthecaseofmanipulatedtestscoresany impactsof inflatedstandardizedtestscoresonselfesteemwould likelybemutedbysignalsof lowerperformancefromcoursegrades. Ifteachercheatingdidinfactaffectstudentselfesteem,wewouldexpectthatstudentoutcomeswould improve (beyond thedirecteffectof falsificationon scores)until falsificationbecameknownandthentheywoulddecline,perhapstolevelsbelowwhatwouldbeexpectedpriortotheinitiationofcheating.

    B.GradeInflationFalsetestscoresthatresultfrommanipulationbyteachersandadministratorsareakin

    tosocalledgrade inflation,wherestudentsreceivegradesthataregreaterthatwhatmightotherwisebe justifiedbytheiracademicperformance. Babcock (2010) findsthat increases inexpectedgradesleadstudentstostudyless.Usingsurveydatafromcollegecourseevaluations,hefindsaveragestudytimeisreducedbyhalfinaclasswheretheaverageexpectedgradeisanA,relativetoaclassinwhichtheaverageexpectedgradeisaC.Thusitispossiblethat,atleast in theshort term,studentscould react to inflated test scoresbydevoting lesseffort inschool,believing that they canperformwellwith littleeffort. If this is the case,onewouldexpectevenlowerscoresoncetestanswerswerenolongermanipulated.Anysucheffectsonstudentperformancemaybetempered,however, iftheachievementexamswereconsideredtobelessconsequentialthancoursegradesbystudents.1

    C.Rewards,MotivationandTeacherEffortIt isalleged thataprimarycauseof teachercheating inAPSwasexternalpressure to

    demonstratepositive schoolperformance. Twoways to increase test scoresare to improveinstructionortocheatbyproviding inappropriateassistancetostudentsbeforeorduringtheexam and/or correctwrong answers expost. The opportunity to boost scores via cheatingshouldreducethepayofffromimprovedteachereffectiveness(sinceotherwiselowtestscores1Theimpactmayalsobedampenedifachievementtestresultsarenotknownuntilaftertheendoftheacademicyear.

  • 3

    wouldbemanipulated).Thisinturncouldresultinadecreaseinteachereffortandareductionintruestudentlearning.Akeyassumptionforthistooccuristhatteachersadjusttheirefforttochanges in incentivesrelatedtotestscores. Indirectevidenceontherelationshipbetweenteachereffortandincentivescanbefoundintheliteratureontheeffectsofperformancepay.Whileearlyexperimentalstudies found littleevidencethatperformancepayboostedteacherproductivity(e.g.Springer,etal.2010;Springer,etal.2012;Fryer,2013),morerecentanalysisby Dee andWyckoff (2013) of a districtwide scheme operated at scale (Washington DCsIMPACTteacheraccountabilitysystem)providesstrongevidencethatexistingteacherswill infact adjust their teaching performance in response to significant incentives. Thus anothermechanismbywhichtestscoremanipulationcouldhaveaffectedstudentoutcomeswouldbethrough a reduction in teacher effectiveness. Once teacher cheating stopped, instructionalquality should have returned to premanipulation norms. However, if teaching quality haspersistenteffectstherecouldbenegativeconsequencesforlaterstudentoutcomes.

    D.InterventionsBasedonStudentAchievementOneof themainconcernswith teachercheating is thatbecauseofartificially inflated

    test scores, students are not identified to receive remedial services, such as interventionprograms, summer school or retention. If remedial programs increase student achievement,denial of these services could potentially have lasting effects on the student.However, theevidenceontheefficacyofretentionandsummerschoolplacementforlowachievingstudentsissomewhatmixed,withpositiveeffectsoccurringprimarilyinelementaryschool.Mostoftherigorous research studies conducted thus farexploitachievementbased rules formandatingretention and/or summer school and compare outcomes for students just below theintervention threshold (who receive services) with those whose scores are just above thethreshold (and thus do not receive remediation). Jacob and Lefgren (2004) find that bothsummer schooland retentionhaveapositive (but small) impacton studentachievement forthirdgradersandnoeffectforsixthgradersinChicago.MarianoandMartorell(2012)analyzeatwo tiercutoffsystem inNewYorkCitywherestudentswhoscorebelow thecutoffon theirspring assessment are assigned to summer school; if these students fall below the summerevaluationcutofftheyareretained.Theauthorsfindthatsummerschoolhadasmallpositiveimpactonstudents6thgradeELAachievementiftheyattendedformissingtheELAcutoff,butnotforstudentswhoscored lowonthemathtest. Usingasameyearcomparison,theyfindthatretentioninfifthgradehasalarge,significantandpositiveeffectonstudentperformanceforthefollowingtwoyears.Matsudaira(2008)studiesamandatorysummerschoolpolicyandfinds that positive and quantitatively substantial average effects of summer school onsubsequent achievement in bothmath and reading achievement. However, the estimatedimpactsvary substantiallyacrossgradeleveland subjectcombinations. WintersandGreene(2012)evaluatethe impactsofFloridasthirdgraderetentionpolicy. Notonlywerestudentswhofellbelowthethresholdretained,theyalsowereassignedahighqualityteacher intheretentionyearandwererequiredtoattendsummerschool. Thiscombinationofremediationeffortshadsubstantialpositiveeffectsonachievementinmultiplesubjects,thoughtheeffectsdiminishovertime.

  • 4

    3. DataandBackgroundToanswertheresearchquestionsoutlinedabove,weconstructedalongitudinaldataset

    from APS administrative data. With the assistance of APS staff we were able to collectanonymous individualleveldataonenrollment,attendance,discipline,programparticipation(e.g.specialeducation,subsidized lunch,Englishasasecond language),schooltypes,studentdemographics anddiploma receipt. APS also provided state test results from theCriterionReferenced Competency Test (CRCT) in grades 18 and high school EndofCourse Tests(EOCTs).2 Forboth theCRCTandEOCTswe computednormalized scores foreachgrade (orcoursefortheEOCTs)andyearbasedonstatewidemeansandstandarddeviations.AllofthedatawereassembledintoapanelcoveringallstudentsenrolledinAPSfromthe2004/05schoolyearthroughfallof2014.

    AnotherkeydatasourceistheGeorgiaBureauofInvestigation(GBI)reportonteachercheating inAPS (Office of theGovernor, 2011). TheGBI investigation included informationfrom the Governors Office of Student Achievements (GOSAs) analysis of erasures on thespring2009CRCTexam.ResultsoftheerasureanalysiswereusedbytheGBItoselectschoolsfor detailed investigation, which included interviews with school personnel. Just over 60percentof thedistrictselementaryandmiddle schools receivedadetailed investigation. Inoverhalfof these schoolseducators confessed to cheating and investigators concluded thatsystemicmisconductoccurred inover threefourthsof the schools thatwere investigated indetail.Theinvestigationalsorevealedthatcheatinghadbeengoingonforsometime,perhapsas far back as 2001 in some schools. Interview data from the report provided valuablecontextualinformationabouthowteachercheatingwascarriedout.

    ThedataprovidedbyAPS include individuallevelerasuredata for theCRCT in schoolyears 2008/09 through 2012/13. The erasure data from 2008/09 and 2009/10 only coverschools that were investigated by the GBI, based on high levels of wrongtoright (WTR)erasures inoneormore classroomsandotherevidenceof testing improprieties (henceforthinvestigated schools). Thus we have erasure data for nearly twothirds of the districtselementaryandmiddleschoolsin2008/09,aboutonefifthofelementaryandmiddleschoolsin2009/10andthencompletedistrictwideerasuredatain2010/112012/13.3TheerasuredataprovidedtouscontaintherawnumberofcorrectanswersandthenumberofWTRerasures.The individuallevelerasuredatawerecombinedwith informationonthecriteriausedbytheGBI to flag classrooms suspected of cheating.4 Using the erasure data and the GOSA

    2Thecriterionreferencedexamwasadministeredingrades18throughthe2009/10schoolyear.Inlateryears,thetestwasadministeredonlyingrades38.3Our2008/2009erasuredatacoverallschoolsinitiallytargetedforinvestigation,includingtwoschoolsthatdidnotreceiveadetailedinvestigationbecauseinitialinquiriesuncoverednoevidenceofimproprieties.4ClassroomswereflaggedwhenthenumberofWTRerasureswasgreaterthanthreestandarddeviationsabovethestatemean.Anadjustmentwasmadeforclasssizebydividingthestandarddeviationbythesquarerootoftheclasssize.TheGBIreports(OfficeoftheGovernor,2011)refertoflaggedclassrooms,thoughtheywerein

  • 5

    methodology, we identified both schools which were investigated by the GBI and flaggedclassroomswithintheinvestigatedschoolsforschoolyears2008/09and2009/10.

    4. ResearchDesignandMethods

    A.IdentifyingCheatingA key element in the analysis of the effects of teacher cheating is identifyingwhich

    studentshadtheirtestscoresmanipulated.Therearethreetypesofteachercheatingpossible.First,teacherswithadvancedknowledgeoftheexamquestionsandanswerscouldhaveusedactual testquestions in their lessonsandcommunicated theanswersprior to theexam. Werefer to this typeof scoremanipulationas exante cheating. Second, teachers couldhaveguidedstudentstothecorrectanswerduringtheexamorgivenstudentsthecorrectanswersoutrightduring theexam. We call this contemporaneous cheating. Third, teachers couldhave correctedwrong answers after students turned in their exams. We dub this expostcheating.

    Interviews conducted during theGBI investigation uncovered evidence that all threetypesof cheatingoccurred inAPS. Teachersandother schoolpersonneladmitted theyhademployedavarietyofmethods tomanipulate testscores. These includedreviewing the testquestions prior to test administration and prepping student responses (exante cheating);positioning low and highability students next to each other and allowing students to copyanswers from one another during the exam plus signaling the correct answers to studentsduring the test (contemporaneous cheating); and filling in empty answers with correctresponsesorchangingstudentsanswersfromwrongtorightaftertheexam(expostcheating).

    There are threemethods that have been used in prior research to identify teachercheating. Onemethod,developedby JacobandLevitt (2003a,2003b), is to look forunusualpatternsofresponses,eitherwithinasinglestudentsanswersheetoracrossstudentanswersheets.Wedonothaveaccesstotheanswersheetsofindividualstudents,sothisapproachisnotanoptioninthecurrentanalysis.

    A second approachemployedby Jacob and Levitt (2003b) isbasedonunusual intertemporalchanges intestscores.Teachercheatingofanysortshould leadto increases intestscores.Correspondingly,oncecheatingstops,testscoresshoulddroptoreflectstudentstrueachievement levels. JacobandLevitt identifiedstudentsasbeingcheated iftheyexperiencedlargeincreasesintestscoresinoneyearfollowedbymodestincreasesorevendeclinesinthefollowingyear.GiventhatcheatingallegedlyoccurredoverseveralyearsinAPS,therunupintestscoresassociatedwithcheatingwillnotbeobservedforstudentswhosetestscoreswerefactgroupsofstudentswhowereadministeredagiventestbyasingleproctor.Thetestscoreadministratorwasnotnecessarilytheclassroomteacherforthetestedsubject.

  • 6

    alwaysmanipulatedpriortotheeliminationofcheating.Forexample,ifcheatingwaspervasiveina school and itbeganbeforea studententered firstgrade, thenonlymanipulated scoreswouldbeobservedprior to theendof cheating. Consequently, identifying teacher cheatingbasedonunusualincreasesintestscoresisproblematicinourcontext.

    AsillustratedinFigure1,wedoobserveadropintestscoresafterteachercheatingwasuncovered in 2009. Students in flagged classroomswithin investigated schools in 2009 hadsubstantially higher normalized scores (on the order of 0.5 standard deviations) than didstudents innonflagged classes in2009or students in formerly investigated schools in2010.Usingtestscoresdropstoidentifycheatingcreatesproblemsforouranalysisoftheimpactsofcheating, however. As explained below,we rely on the first score a student receives aftercheating ends to control for their true ability. Since test score drops are the differencebetweenthe lastmanipulatedscoreandthefirsttruescore, includingbothameasureoftestscoredrops(toidentifyhavingbeencheated)andthefirstpostcheatingtestscore(tomeasurestudentability)inananalysisoffutureachievementwouldbeakintousingmanipulatedscorestopredictfuture(true)scores.Absentcompletecheating,manipulatedscoreswillbepositivelyrelated toability. Thus the relationshipbetweenbeing cheated (large test scoredrops)andfutureachievementwouldbebiasedupward.Weconfirmedthistobethecaseinpreliminaryanalyseswherewefoundaquantitativelylargeandstatisticallysignificantpositiverelationshipbetween large test score drops from spring 2009 to spring 2010 and post2010 test scores(holdingconstant spring2010 test scores). We thereforedonotemploy test scoredrops toidentifyteachercheating.

    Rather thanunusualanswerpatternsor largeyeartoyearchanges in test scores,werelyonthethirdmethodofidentifyingcheatingbyteachers,countsofwrongtorighterasures,to determine which students had their scores manipulated. In the absence of cheating,erasuresofanykindshouldberelativelyinfrequent.Also,iferasuresaretheresultofstudentuncertaintybetweentwopossibleanswerswewouldexpectwrongtorightandrighttowrongerasures tobeaboutequally likely. Oneadvantageoferasureanalysis is that, incontrast tointertemporalchanges in testscores,high levelsofwrongtorighterasureswouldnot resultfrom studentswho arebecoming sick, having abaddayorother randomevents that areunrelatedtocheating.5Themajordisadvantageoferasureanalysis,however,isthatitcanonlyidentifyexpostcheating. To theextent thatexantecheatingorcontemporaneouscheatingoccurreditwouldtendtoreducethenumberofWTRerasuresandleadtounderidentificationofcheatingbasedonerasurecounts.

    Oneway togauge theextentofexanteandcontemporaneous cheating (andhencethepotentialunderidentificationcheatingwhenWTRerasuresareusedtoidentifycheating)istoobservechangesovertimeininitialtestanswers(i.e.answersgivenpriortoerasures).Ifexanteandcontemporaneouscheatingoccurredduringthe2009exam,thenwhenstatemonitors5Onenotableexceptioniscaseswhereastudentinitiallymarkstheiranswersinwrongcolumn(e.g.bubblesanswerBratherA,CratherthanB,etc.),realizestheirmistake,anderasestheiranswerstocorrectthemistake.

  • 7

    werepresentinAPSschoolsduringthe2010CRCTadministrationwewouldexpectareductionin the number of initially correct answers (prior to any erasures), relative to 2009. WeapproximatetheinitialnumberofcorrectanswersbysubtractingthenumberofWTRerasuresfromthetotalnumberofcorrectanswers.Werefertothismeasureastheinitialright.6Theinitialright scores could also have risen in 2010 in the absence of exante andcontemporaneous cheating if the exam simplybecame easier. We can distinguish betweenthesehypothesesbytakingintoaccounthowtheCRCTswereadministered.Ingrades1and2the exam questions and possible answers were read to students while in grades 38 thestudentsreadthequestionsandpossibleanswersindependently.Havingteachersreadingthequestionsandanswers inthe lowergradeswouldmake iteasierforteacherstoengage inexante cheating in a number ofways. Onemethod noted by theGBI is simply changing theinflexionoftheirvoicewhenreadingthecorrectanswer.

    Figure 2 provides evidence that exante cheating did occur andwas concentrated ingrades1and2.Thegraphsshowthedistributionofinitialrightscoresbygradeforboth2009and2010.Thesampleislimitedtoschoolsthathadasignificantproportionoftheirclassroomsflagged forhighWTRerasurecounts inboth2009and2010 (sincewehaveerasuredata foronly such schools in those years). Thus the sample includes slightly less than20percentofdistrictschools.Consistentwithexantecheatingbeingeasiertoimplementingrades1and2,we see that the number of initially correct answers fell sharply in 2010 for grades 1 and 2whereas the test score distributions in 2009 and 2010 are roughly similar for grades 3 andabove.7Thus,althoughwecannotrejectthepossibilitythatsomeexanteorcontemporaneouscheatingoccurred inhighergrades,wecanbemoreconfidentthatWTRerasurecountsareagoodmeasureofallformsofteachercheating ingrades3andabove. Giventhe likelihoodofmoresubstantialexantecheating ingrades1and2,whenemployingWTRerasurecountstoidentify cheating we conduct separate estimates of the impact of cheating on studentoutcomesforgrades38andforgrades12.

    WebaseourerasurecountmeasureofcheatingonthedistributionofWTRerasuresonthe spring2013exam (whenallevidence suggests thatcheatingno longerexisted). Table1showsthe90th,95thand99thpercentilesofthedistributionofWTRerasurecountsinboththelast year of pervasive cheating, 2009, and the last year of available erasure data, 2013. Astudentwasdesignatedashavingbeencheated in2009 if thenumberofWTRerasuresonagiven exam exceeded the number ofWTR erasures corresponding to the 95th percentile in2013. Thus, forexample, ifastudentsCRCTreadingexam in2009hadmorethan fourWTRerasures,theywereclassifiedashavingbeencheated.6Theactualnumberofinitialcorrectanswerswillequalthetotalrightaftererasures,minusWTRerasures,plusrighttowrongerasures.Unfortunately,wedonotpossessinformationonthenumberofrighttowrongerasuresin2009.Weassumethatthenumberofrighttowrongerasures,whilenotmeanzero,israndomlydistributedacrossstudents.ThususingthenumbercorrectlessthenumberofWTRerasureswillserveasareasonableproxyforthenumberofinitiallycorrectanswersforourpurposes.7Themiddleschooldistributionsarebasedononlyafewschoolsandthusarelessprecise.

  • 8

    B.DifferentialCheatingandControllingforStudentAbilityPriorresearchonteachercheating(e.g.JacobandLevitt,2003a,2003b),aswellasthe

    GBI investigation, focused on identifying classrooms where cheating occurred in order toidentifywhichteachersengaged in illicitbehavior. Incontrast,thefocusofouranalysis isonstudents rather than on teachers. The extent of teacher cheating can vary across studentswithinclassrooms,i.e.theimpactofacheatingteacherwillnotbethesameforallstudentsintheclass.Forexample,giventhetimecosts involved,cheatingteacherscouldchoosetoonlycorrectanswers fortheirweakerstudents,whowould likelyhavemore initialwronganswersandwould thusproduce thegreatestgains in test scoreswhen theiranswersare corrected.Similarly,even if a teacher reviewed the answersof all students,more able studentswouldhavefewerwronganswerstobeginwithandthusanymanipulationofanswersexpostwouldhaveasmaller impacton thestudentsscore. Evidenceofsuchwithinclassroomvariation isprovidedinTable2,whichshowsthedistributionofWTRerasuresbysubjectwithinGBIflaggedclassrooms in 2008/09. For both reading and ELA, over onethird of students in flaggedclassroomshadtwoorfewerWTRerasuresandoveronefourthofstudentshadonlyoneornoWTR erasures on their reading exam. Inmath, the proportion of studentswith fewWTRerasureswassmaller,butstillsubstantial.NearlyonefourthofstudentsinflaggedclassroomshadtwoorfewerWTRerasuresontheirmathCRCTexam.

    Becauseourinterestisinevaluatinghowcheatingimpactedlongrunstudentoutcomes,wemustevaluatetheextenttowhichstudentabilityplayedaroleinselectivecheating.Ifthestudents selectedby a teacher for cheatingwereweaker students tobeginwith,a longrunachievementanalysiswhichfailedtoaccountforsuchdifferenceswouldbebiaseddownward.Indeed,wefindstrongevidencethatteachersselectedwhichstudentstocheatbasedontheirability. Figure3Ashows thedistributionofWTRerasurepatterns inacheatingenvironment(2009 flagged classrooms) by quintile of student ability in math, where student ability ismeasuredby2010CRCTscores. Figure3BshowsthedistributionofWTRerasure inanormalenvironment(2013),brokendownbymeasuredstudentabilityinmath,wherestudentabilityischaracterized by performance on the 2013 CRCT exam. In the cheating environment, thefewesterasuresareforthehighestabilitystudents(Q1)withthedistributionsofWTRerasuresprogressivelyskewedtotherightforstudentsoflowerapparentability.Incontrast,inthenoncheatingenvironment,thequintilegraphsarenearlycoincident,saveforslightlyfewererasuresamongstthemostablegroup(Q1).Thisistobeexpectedsincehigherabilitystudentsarelikelytobemoreconfidentintheirinitialanswers.Takentogether,theseresultsclearlysuggestthatteachersweremorelikelytochangestudentanswersforstudentsoflowerability.

    The challenge imposed by selective cheating can be addressed by appropriatelycontrollingforstudentability.Ifcheatingonthe2009examwereanisolatedincident,wecouldrelyontestscoresinprioryearsasameasureofstudentability;however,asdiscussedabove,APS experiencedwidespread, longterm cheatingwhich renders pre2009 results unreliable.Twootheralternativesexist.Oneistorelyonour2009estimateofinitialrightanswers.Inthe

  • 9

    caseofexpostcheating,thismethodhastheappealofmeasuringabilityimmediatelypriortothetreatment. However,becauseweareaware that someexanteandcontemporaneouscheating existed, reliance on the initial right scores could lead to bias in our estimates.8 Asecondoptionistorelyon2010testscoresasameasureofability.Theappealofthisoptionisthatwehavegreaterconfidenceinitsaccuracyasthetestobserversledtoasignificantlylowerincidence of cheating in 2010. However, the downside to thismeasure is that it limits themechanismsweareable toevaluate. Ouranalysis can stillmeasure theeffectsof studentsrealizingtheywerecheated. However,wecanno longerevaluatetheeffectofpotentialselfesteemboostsoccurringinthe2009/10schoolyearasaresultof2008/09cheating.

    C.ChoosingaComparisonGroupWe are interested in the causal impact of teacher cheating on subsequent student

    outcomes. Put differently, how would a student have fared if their scores had not beenmanipulated? Onepossibility istocompareoutcomes forastudentbeforetheirscoresweremanipulatedtotheirpostcheatingoutcomes. Suchawithinstudentanalysiswouldholdanytimeinvariantstudentcharacteristicsconstant.Foroutcomesthatareobservedineachperiod,suchasattendanceordisciplinary incidents,thiscouldbeaccomplishedbyestimatingmodelsof studentoutcomes that include indicators for the cheatedandpostcheatedperiodsalongwithstudentfixedeffectsthatcreateaseparatebaselineforeachstudent.Unfortunately,asdiscussed above, in many instances teacher cheating occurred across multiple classroomswithin a school over several years,meaning that formany studentswe never observe precheatingoutcomes. Further, foronetimeoutcomessuchashighschoolgraduation, it isnotpossibletoestablishawithinstudentprecheatingbaseline,makingthisapproachinfeasible.

    The other alternative is to compare outcomes for studentswhowere cheatedwiththosewhowere not cheated. This is potentially problematic, however, since noncheatedstudentsmay be different from cheated students inways that are correlatedwith studentoutcomesandthuswecouldfalselyattributedifferencesinoutcomestocheatingwhichmightbecausedbyother factors. Forexample,manyschools inAPSwerenot investigatedbytheGBIbecause therewerenoclassrooms flagged forhavinghigh levelsofWTRerasureswithintheschoolortherewasonlyan isolatedflaggedclassroomwithnocorroboratingevidenceofteachercheating. Thesenoninvestigatedschoolstendedtoservemuch lowerproportionsofstudentsfrom lowincomefamiliesthandid investigatedschools. Whiletherewereanumberofnoninvestigatedschoolsservingstudentpopulationswithsimilardemographicstothoseofinvestigated schools, the fact that no cheating was uncovered in these schools may beindicative of differences in school leadership, school culture, or average teacher quality.Withinschool comparisons can be made by estimating models of student outcomes thatinclude school fixed effects. However, evenwithin investigated schools, sortingof studentsacross teachers (either due to parental preferences for teachers, teacher preferences for8Anotherproblemwithusingtheinitialrightisthatquestionsvaryindifficultyandthustherawnumberofinitialrightanswersdoesnotdirectlymapintothetruescalescoreforastudent.

  • 10

    studentsorabilitytrackingofstudents)couldresultinsignificantdifferencesintheunderlyingcharacteristicsofstudentswhoattendedclassroomswhereteachercheatingoccurredvisvisstudentsattendingnoncheatingclassroomsinthesameschool.Itisalsopossibletocomparecheatedandnoncheatedstudentsinthesameclassroombyincludingclassroomfixedeffects,thoughnoncheatedstudentsarelikelytobeofhigherabilitythancheatedstudentssincethereis less for a teacher to gain frommanipulating alreadyhigh scores. Also,withinclassroomcomparisons would not measure any cheating behavior that affects all students within aclassroom equally, so as a reduction in teacher effort. Further, making withinclassroomcomparisons reduces the number of students being directly compared and will thereforediminishtheprecisionofanyestimatedeffects.

    For the crossstudent comparisonswe take a number of steps tominimize any biasassociatedwiththenonrandomexposureofstudentstoteachercheating.First,wecontrolforavarietyofobservablestudentcharacteristics, includinggender,race/ethnicity,free/reducedpricelunchstatus,giftedstatus,limitedEnglishproficiencyanddisabilitystatus.Inaddition,wecontrolforunmeasuredabilitybyincludingthefirstnotcheatedtestscore.Forstudentswhohad been continuously cheated, the first nonmanipulated scorewould be the revelation oftrueachievementforastudent.Assuch,thescorewouldnotbeinfluencedbypotentiallossinselfesteem. However,thefirstnoncheatingscorecouldbeadownwardlybiasedmeasureof true ability if past teacher cheating led to either reduced teacher or student effort andachievementhaspersistenteffectsovertime.

    D.EconometricModelThedecisionsdescribedaboveregardingtheissuesofidentifyingcheating,differential

    cheatingbyteachers,controllingforstudentabilityandconstructingavalidcomparisongroupleadustothefollowingempiricalmodelofstudentachievement:

    , , , ,

    where,, isachievement levelforstudent i,tyearsafter2009,Cheatequalsone iftheindividualwas cheated in 2009 and zero otherwise, Xi,t+1 is a vector of student and familyspecific characteristics such as race, gender, lunch status, gifted status, Limited EnglishProficiency, Special Education status and i,2009+t is the idiosyncratic error term. The mainparameterof interest is.Weestimatethreevariantsofthemodel,onewithnofixedeffects(asspecifiedabove),asecondwithschoolfixedeffectsthatproducewithinschoolcomparisons(i.e.withanadditionaltermkwherekindexesschools)andathirdwithclassroomlevelfixedeffects that producewithinclassroom comparisons (ie.with an additional term j,where jindexesclassrooms).

  • 11

    5. ResultsA.DescriptiveStatistics

    InTable3weprovidedescriptivestatisticsforstudents,brokendownbyschooltypeofenrollment in 2008/09. Investigated schools enrolled over onefourth of the studentpopulation. Relative to noninvestigated schools, investigated schools serve a higherproportionofminoritystudents, fewergiftedstudentsanda larger fractionofstudents fromlowincomefamilies(asmeasuredbyfreeandreducedpriceeligibility).

    In Table 4we present a tabulation of the number of studentswhowere enrolled inflagged classrooms in 2008/09 by their continued enrollment in APS in subsequent years,broken down by their grade level in 2008/09. It is clear that the vastmajority of flaggedclassroomswere inelementaryschools (grades15). Further,while therewassomeattritionafterthetruescoreswererevealed in2010,therewasnota largeexodusofstudentsfromthedistrict. Fewofthestudentswhowere ingrades7and8 in2008/09remain inAPS infall2014 since their cohorts would have graduated (assuming normal progress and ontimegraduationattheendof12thgrade). Forstudentswhowere ingrades18 in2008/09about5060percentofthemremaininAPSinfall2014.Theproportionremainingtendstobehigherfor younger cohorts (around 60 percent for the 2008/09 1stgrade cohort) than for oldercohorts (slightly less than 50 percent for the 2008/09 cohort),which is likely due to higherattrition as a result of dropouts in high school. A total of 5,888 students who were inclassroomsflaggedforextraordinarilyhighWTRerasuresbasedon2009testresultsremaininAPSinfall2014.

    Table5presentsabreakdownofenrollment inAPS traditionalandcharterschoolsbyyearforthosestudentswhowere inaflaggedclassroom in2008/09. There issomeevidencethatthecheatingscandalleadtoamovementawayfromtraditionalpublicschoolsandintothechartersector.Therearefairlylargeincreasesinenrollmentincharterschoolsinthetwoyearsaftertherevelationofteachercheating,2009/10and2010/11.Afterthe2010/11schoolyearthe absolute enrollment in charters stabilizes, though the relative proportion continues toincrease.

    Table6providesa tabulationof the fall2014grade level for studentswhowere inaflagged classroom in 2008/09 by their 2008/09 grade level. For each 2008/09 cohort, themajorityofstudentsareontrack,enrolled inagradethat issix levelsabovetheirenrollmentgradein2008/09.Ofthe5,888whowereinclassroomsflaggedbytheGBIin2008/09andareenrolledinanAPSschoolinfall2014,abitlessthanhalf(2,477of5,888or42percent)havenotyetreachedhighschool.

    Table7presentsatabulationofenrollmentinAPSbyyearbythelevelofWTRerasuresonthe2009CRCT.ThefirstrowindicatesenrollmentbystudentsdefinedascheatedbasedonhavingmoreWRTerasuresonthe2009CRCTinanyofthreesubjects(reading,mathorELA)

  • 12

    than the 95th percentile ofWRT erasures in a noncheating environment (the 2013 CRCT).Basedonthisdefinition,7,064studentshadtheirexamanswersmanipulated;3,728ofwhichwere stillattendingAPS schools in fall2014. The secondand third rows studentenrollmentcountsprovidealternativebenchmarks,5ormoreWRTerasuresonlyanyofthethreesubjectexamsor10ormoreerasuresonanyofthethreesubjectexams.B.EstimatedEffectsofTeacherCheatingonStudentOutcomes

    OurprimaryresultsarepresentedinTables8A8C.Thetablesreportestimatesofinequation(1),whichrepresentstherelationshipbetweenbeingacheatedstudent(i.e.havinganumberofWTRerasuresin2009thatexceedsthe95thpercentileforWTRerasuresin2013)and subsequentnormed test scores,holding constantobservable student characteristicsandthestudents2010normalizedtestscore. Eachrowrepresentsestimatesfromaspecificationwithvaryinglevelsoffixedeffects.Eachcolumnrepresentstheestimateofonachievementfromadifferentyear.Resultsarepresentedforstudentsingrades1and2in2008/09(wherecontemporaneous teachercheatingwas likelymoreprevalent),grades36 in2008/09andallgrades16together.

    InTable8A,whichreportsresultsformath,wefindconsistentnegativeeffectsofbeingcheatedonfuturetestscoresforstudentswhowereingrades1and2in2008/09.Theeffectsrangefrom0.06to0.10standarddeviationsandarefairlyconstantovertime.Thisisonparwith the effect of having a rookie teacher rather than one with five years of experience(Clotfelter, et al., 2006). Alternatively, the effect is equivalent to 18 to 29 percent of theaverageannual learninggain inmathformiddleschoolstudentsof0.34(Lipsey,etal.,2012).Themodelthatincludesclassroomfixedeffects,thatistheonethatcomparesstudentswithina classroom,produces the largest effects. However, the resultsdonot changedramaticallywithschoolfixedeffectsorevenwithnofixedeffectsatall.Resultsforstudentswhowereingrades36in2008/09arequitedifferent.Saveforonepositiveandsignificantcoefficient,theremaining 11 estimates are all insignificantly different from zero. When all students aregrouped together, thepointestimatesareallnegative,butonlyestimates for2013/2014aresignificantlydifferentfromzeroacrossallthreespecifications.For2013/14thenegativeeffectofhavingbeencheated in2008/09 isestimatedtobe intherangeof 0.04to 0.07standarddeviationsorabout12to21percentofannualachievementgainsformiddleschoolstudentsinmath.

    Incontrast tomath,results forreadingachievement,whicharedisplayed inTable8B,are almostuniformlynegative and statistically significant. Theestimatedmagnitudesof theeffectofbeingcheatedforstudentswhowere ingrades1and2aresubstantially largerthanthe size of those for students who were in grades 36 in 2008/09. When all grades arecombined,theestimatedeffectofbeingcheatedonreadingtestscores infutureyearsfalls intherangeof0.06to0.14,whichisequivalentto22to52percentofannuallearninggainsinreadingformiddleschoolstudents(whichequal0.27).Thelargestestimatesonceagaincome

  • 13

    from themodelswhichmakewithinclassroom comparisons, i.e. themodelswith classroomfixedeffects.

    Estimates forELA,presented inTable8C,are similar to those for reading. Estimatedimpactsofbeingcheatedarealmostalwaysnegativeand statistically significant. Results forgrades1and2arelargerinmagnitudethanforstudentswhowereingrades36duringthelastyearofwidespreadteachercheating,2008/09. Combiningallgrades,theestimatedimpactofbeingcheatedonstudentachievementisintherangeof0.06to0.12standarddeviations.

    In addition to estimating effects on later student achievement, we also estimatedversions of equation (1) that replaced future test scoreswithmeasures of attendance andbehavior. For thesealternativeoutcomeswedefinebeingcheatedashavingunusuallyhighWTRerasures(greaterthanthe95thpercentileofthe2013distribution)ineithermath,readingor ELA. Similarly, rather than include only a singlesubject 2010 test score as a control forstudentability,weinclude2010scoresinmath,readingandELA.Wealsoinclude2009valuesof the outcome variable (attendance or behavior) as a control. Finally, in the equationestimating the number of disciplinary incidentswe include grade level indicators as controlsincedisciplinary incidents tobemuchhigher inmiddle andhigh school than inelementaryschool.Otherwise,thespecificationsareidenticaltothosefortheanalysisoftestscores.

    Table9presentsestimatesof the impactofbeing cheatedon thepercentofdays inattendanceineachoftheyears2010/11through2013/14.In2010/11(thefirstyearfollowingtherevelationoftruescores)thereappearstobenoeffect;alloftheestimatedcoefficientsarepositiveandstatistically insignificant. However, in lateryears thepointestimatesarealmostalwaysnegativeandoftenare significantlydifferent fromzero. It is important to rememberthat inthe latteryearsstudentswhowere intestedgrades (18) in2008/09aremostlyall inmiddleandhighschool,whereabsenteeismmaybemoredirectlyamatterofstudentchoice.However,even ifwefocusontheresultsthatarestatisticallysignificantlydifferentfromzero,theestimatesarequite small inmagnitude,on theorderof0.3 to0.8percentagepointsorabout1schooldayoutofthetypical180dayschoolyear.

    Finally,inTable10,wepresentestimatesoftheimpactofbeingcheatedin2008/09onthenumberofdisciplinary incidents inyears2010/11through2013/14. Thereappearstobelittle in thewayofeffectsof teachercheatingonsubsequentstudentbehavior. Whilepointestimates aremostly positive, in only one case are they significantly different from zero atstandardconfidencelevels.

    6. SummaryandConclusionsMuchefforthasbeendevotedtoidentifyingtheteachersandadministrators

    responsibleformanipulatingtestscoresinAPSandbringingthoseresponsibletojustice.Atthesametimelittleisknownaboutthevictimsofthecheatingscandal.Thisreportrepresentsthe

  • 14

    firstattempttorigorouslyanalyzetheimpactofteachercheatingonthelongrunoutcomesofstudents.

    Inconductingtheanalysiswefacedseveralchallenges,includingidentifyingwhichstudentswerecheated,derivingameasureofthestudentstrueabilityandcomingupwithareasonablecounterfactualgroupforcomparison.Giventheavailabledataandthehistoryoftestmanipulation,wechosetoidentifycheatedstudentsasthosewhohadhighnumbersofwrongtorighterasuresonthe2009CRCTexamandusedtheir2010CRCTscoresasameasureoftheirtrueability.Basedonthesechoices,weestimatedtheeffectofbeingcheatedonlatertestscores,attendanceandbehaviorofstudents.Comparisonsweremadebetweencheatedstudentsandnoncheatedstudentsgenerally,betweencheatedandnoncheatedstudentswhoattendedthesameschoolin2008/09andbetweencheatedandnoncheatedstudentswithinthesameclassroomin2008/09.

    OurresultsindicatethatbeingcheatedhadnegativeconsequencesforlaterstudentperformanceinbothreadingandELA.Theestimatedimpactsareintherangeofonefourthtoonehalfoftheaverageannualachievementgainforamiddleschoolstudent.Thisisequivalenttoonetotwotimesthedifferencebetweenhavingarookieteacherandonewith5ormoreyearsofexperienceinasingleyear.Incontrast,theeffectsofteachercheatingforsubsequentachievementinmatharemixed.Thereislittleevidencethatteachercheatinghadanydeleteriouseffectsonsubsequentstudentattendanceorstudentbehavior.Anyimpactsthatmayhaveoccurredwereverysmall.

  • 15

    ReferencesBabcock,Philip(2010).RealCostsofNominalGradeInflation?NewEvidencefromStudent

    CourseEvaluations,EconomicInquiry,48(4):983996.deAraujo,PedroandStephenLagos(2013).SelfEsteem,EducationandWagesRevisited,

    JournalofEconomicPsychology,34:120132.Clotfelter,CharlesT.,HelenF.Ladd,andJacobL.Vigdor(2006).TeacherStudentMatching

    andtheAssessmentofTeacherEffectiveness,"TheJournalofHumanResources,41(4):778820.

    Dee,ThomasandJamesWyckoff(2013).Incentives,SelectionandTeacherPerformanceEvidencefromIMPACT.WashingtonDC:NationalBureauofEconomicResearch,NBERWorkingPaperNo.19529.

    Fryer,RolandG.(2013).TeacherIncentivesandStudentAchievement:EvidencefromNewYorkCityPublicSchools,JournalofLaborEconomics,31(2):373407.

    Jacob,BrianA.andStevenD.Levitt(2003a).CatchingCheatingTeachers:TheResultsofanUnusualExperimentinImplementingTheory,BrookingsWhartonPapersonUrbanAffairs,2003:185220.

    Jacob,BrianA.andStevenD.Levitt(2003b).RottenApples:AnInvestigationofthePrevalenceandPredictorsofTeacherCheating,QuarterlyJournalofEconomics,118(3):843877.

    Jacob,BrianA.andLarsLefgren(2004).RemedialEducationandStudentAchievement:ARegressionDiscontinuityAnalysis.ReviewofEconomicsandStatistics,86(1):226244.

    Kingston,NealM.andAmyKClark(eds.).TestFraud:StatisticalDetectionandMethodology.NewYork:Routledge,2014.

    Lipsey,MarkW.,KellyPuzio,CathyYun,MichaelA.Hebert,KasiaSteinkaFry,MikelW.Cole,MeganRoberts,KarenS.Anthony,andMatthewD.Busick(2012).TranslatingtheStatisticalRepresentationoftheEffectsofEducationInterventionsintoMoreReadilyInterpretableForms.Washington,DC:NationalCenterforSpecialEducationResearch,InstituteofEducationSciences,U.S.DepartmentofEducation.

    Mariano,LouisT.andPacoMartorell(2012).TheAcademicEffectsofSummerInstructionandRetentioninNewYorkCityEducationalEvaluationandPolicyAnalysis,35(1):96117.

  • 16

    Matsudaira,JordanD.(2008).MandatorySummerSchoolandStudentAchievement,JournalofEconometrics,142(2):829850.

    Springer,Matthew,J.R.Lockwood,DaleBallou,DanielF.McCaffrey,LauraHamilton,MatthewPepper,ViNhuanLeandBrianStecher(2010).TeacherPayforPerformance:ExperimentalEvidencefromtheProjectonIncentivesinTeaching.Nashville,TN:NationalCenteronPerformanceIncentives.

    Springer,MatthewG.,JohnF.Pane,ViNhuanLe,DanielF.McCaffrey,SusanFreemanBurns,LauraS.HamiltonandBrianStecher(2012).TeamPayforPerformance:ExperimentalEvidenceFromtheRoundRockPilotProjectonTeamIncentives,EducationalEvaluationandPolicyAnalysis,34(4):367390.

    vanderLinden,WimJ.andMinjeongJeon(2012).ModelingAnswerChangesonTestItems,JournalofEducationalandBehavioralStatistics,37(1):180199.

    Waddell,GlenR.(2006).LaborMarketConsequencesofPoorAttitudeandLowSelfEsteeminYouth,EconomicInquiry,44(1):6997.

    Winters,MarcusandJayP.Greene(2012).TheMediumRunEffectsofFloridasTestBasedPromotionPolicy,EducationFinanceandPolicy,7(3):305330.

  • 17

    Figure1:DistributionofNormalizedScoresbySubjectfor2009InvestigatedSchools

    0

    .

    1

    .

    2

    .

    3

    -4 -2 0 2 4Standard Deviation

    Math Normed Score

    0

    .

    1

    .

    2

    .

    3

    -6 -5 -4 -3 -2 -1 0 1 2 3 4 5Standard Deviation

    Reading Normed Scores0

    .

    1

    .

    2

    .

    3

    -4 -2 0 2 4Standard Deviation

    2009 Flagged Class2009 Non-Flagged Class2010 Class

    ELA Normed Score

  • 18

    Figure2:DistributionsoftheNumberofInitialRightAnswersbyGrade,2009and2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    0 20 40 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 1 Initial Right, 2009 and 2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    10 20 30 40 50 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 2 Initial Right, 2009 and 2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    0 20 40 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 3 Initial Right, 2009 and 2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    0 20 40 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 4 Initial Right, 2009 and 2010

  • 19

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    D

    e

    n

    s

    i

    t

    y

    0 20 40 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 5 Initial Right, 2009 and 2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    10 20 30 40 50 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 6 Initial Right, 2009 and 2010

    .

    0

    0

    5

    .

    0

    1

    .

    0

    1

    5

    .

    0

    2

    .

    0

    2

    5

    .

    0

    3

    D

    e

    n

    s

    i

    t

    y

    10 20 30 40 50 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 7 Initial Right, 2009 and 2010

    0

    .

    0

    1

    .

    0

    2

    .

    0

    3

    .

    0

    4

    D

    e

    n

    s

    i

    t

    y

    10 20 30 40 50 60No. of Questions Initially Correct

    2009 2010

    Schools Investigated in both 2009 and 2010Grade 8 Initial Right, 2009 and 2010

  • 20

    Figure3A:Distributionof2009WrongtoRightErasuresby2010AchievementQuintileinMath(Studentsinaflaggedclassroomin2008/09andnotinaflaggedclassroomin2009/10)

    Figure3B: Distributionof2013WrongtoRightErasuresby2013AchievementQuintileinMath(Studentswhoattendedaschoolthatwasflaggedin2008/09)

    0.0

    5.1

    .15

    .2.2

    5.3

    0 5 10 15 20 25 30 35 40Wrong To Right Erasures

    Q1 Q2 Q3 Q4 Q5

    0.0

    5.1

    .15

    .2.2

    5.3

    0 5 10 15 20 25 30 35 40Wrong To Right Erasures

    Q1 Q2 Q3 Q4 Q5

  • 21

    Table1.90th,95thand99thPercentilesoftheWrongtoRightErasureCounton2009and2013CRCTExamsinReading,ELAandMath 2009CRCT 2013CRCT Reading ELA Math Reading ELA Math90thPercentile 8 8 12 3 3 495thPercentile 12 12 16 4 4 599thPercentile 19 19 26 6 8 9 Table2.WrongtoRightErasureCountFrequenciesforCRCTReading,ELAandMathinFlaggedClassroomsinInvestigatedSchoolsin2008/09 No.ofWTRErasures Reading ELA Math

    Number Percent Number Percent Number Percent0 1,038 12.23 865 10.50 659 7.281 1,113 13.11 945 11.47 791 8.732 1,061 12.50 978 11.87 800 8.833 917 10.80 876 10.63 857 9.464 741 8.73 796 9.66 757 8.365 621 7.32 663 8.05 725 8.016 523 6.16 583 7.08 577 6.377 412 4.85 449 5.45 523 5.788 349 4.11 374 4.54 492 5.439 281 3.31 313 3.80 419 4.63

    10 243 2.86 259 3.14 332 3.6711 214 2.52 192 2.33 286 3.1612 188 2.22 165 2.00 252 2.7813 150 1.77 146 1.77 208 2.3014 118 1.39 127 1.54 190 2.1015 99 1.17 99 1.20 167 1.84

    Morethan15 419 4.94 410 4.98 1021 11.27Total 8,487 100.00 8,240 100.00 9,056 100.00

  • 22

    Table3.SummaryStatisticsforAPSSchoolsbyInvestigatedSchoolStatusin2008/09 AllSchools Investigated

    SchoolsNonInvestigated

    Schools Mean S.D. Mean S.D. Mean S.D.Black 0.83 0.37 0.95 0.22 0.77 0.42White 0.10 0.29 0.01 0.07 0.14 0.35Hispanic 0.05 0.21 0.03 0.18 0.05 0.23Female 0.50 0.50 0.50 0.50 0.50 0.50SpecialEducation 0.10 0.30 0.09 0.29 0.10 0.30Gifted 0.08 0.28 0.06 0.23 0.10 0.29GiftedServed 0.08 0.27 0.06 0.23 0.09 0.28EarlyInterventionProgram 0.15 0.35 0.25 0.43 0.10 0.30LimitedEnglishProficiency 0.00 0.07 0.00 0.07 0.00 0.07FreeandReducedLunch 0.58 0.49 0.77 0.42 0.49 0.50Observations 54,356 18,542 35,814

  • 23

    Table4.EnrollmentinAPSbyYearby2008/09GradeLevelforStudentsinOneorMoreFlaggedClassroomsin2008/09

    GradeLevelin2008/09

    EnrollmentinAPSbyYear2008/09 2009/10 2010/11 2011/12 2012/13 2013/14 Fall2014

    1 2068 1901 1709 1581 1465 1310 12102 2018 1837 1647 1516 1332 1238 11543 2001 1829 1642 1401 1358 1303 11674 1822 1685 1476 1375 1307 1232 10845 1611 1383 1278 1197 1106 1046 8956 706 642 582 535 486 412 3387 588 538 465 423 370 297 298 739 664 603 525 454 46 11

    Total 11553 10479 9402 8553 7878 6884 5888Table5.EnrollmentinAPSbyYearandSchoolTypeforStudentsinOneorMoreFlaggedClassroomsin2008/09

    SchoolType

    EnrollmentinAPSbyYear2008/09 2009/10 2010/11 2011/12 2012/13 2013/14 Fall2014

    Traditional 11285 10038 8796 7898 7187 6203 5274Charter 268 441 606 655 691 681 614Total 11553 10479 9402 8553 7878 6884 5888

  • 24

    Table6.EnrollmentinAPSbyGradeLevelinFall2014by2008/09GradeLevelforStudentsinOneorMoreFlaggedClassroomsin2008/09 GradeLevelin2008/09

    EnrollmentinAPSinFall2014byGradeLevel 5 6 7 8 9 10 11 12 Total

    1 11 197 1000 1 1 0 0 0 12102 0 8 153 992 1 0 0 0 11543 0 0 5 101 1058 3 0 0 11674 0 0 1 7 491 580 5 0 10845 0 0 0 1 120 264 508 2 8956 0 0 0 0 13 30 69 226 3387 0 0 0 0 1 3 9 16 298 0 0 0 0 0 0 1 10 11

    Total 11 205 1159 1102 1685 880 592 254 5888 Table7.EnrollmentinAPSbyYearbyErasureCountsin2009forStudentsinFlaggedClassroomsin2008/09

    2009WRTErasureMeasure

    EnrollmentinAPSbyYear2008/09 2009/10 2010/11 2011/12 2012/13 2013/14 Fall

    2014>95thPercentileofWTRErasuresin2013inany

    Subject7064 6437 5810 5268 4872 4340 3728

    5orMoreWTRErasuresinAny

    Subject7125 6485 5852 5313 4901 4348 3728

    10orMoreWTRErasuresinAny

    Subject3473 3150 2877 2611 2421 2156 1825

    AllStudentsinFlagged

    Classrooms11553 10479 9402 8553 7878 6884 5888

  • 25

    Table8A:EstimatedEffectofBeingCheatedin2008/09onNormedMathTestScoresin2010/112013/2014by2008/09GradeLevel Grades1&2 Grades36 Grades16

    Model 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14

    NoFixedEffects

    0.0686**(0.0206)

    0.0680**(0.0218)

    0.0657**(0.0241)

    0.0830**(0.0233)

    0.0053(0.0179)

    0.0160(0.0168)

    0.0290(0.0229)

    0.0111(0.0341)

    0.0192(0.0137)

    0.0209(0.0134)

    0.0184(0.0167)

    0.0555**(0.0195)

    SchoolFixedEffects

    0.0610**(0.0195)

    0.0648**(0.0212)

    0.0612*(0.0243)

    0.0688**(0.0224)

    0.0002(0.0150)

    0.0204(0.0153)

    0.0433*(0.0219)

    0.0061(0.0328)

    0.0154(0.0123)

    0.0146(0.0125)

    0.0110(0.0161)

    0.0405*(0.0184)

    ClassroomFixedEffects

    0.0882**(0.0234)

    0.0923**(0.0256)

    0.0622*(0.0307)

    0.0997**(0.0296)

    0.0088(0.0211)

    0.0096(0.0209)

    0.0056(0.0297)

    0.0188(0.0446)

    0.0405*(0.0159)

    0.0364*(0.0163)

    0.0351(0.0215)

    0.0717**(0.0247)

    Standarderrorsclusteredattheclassroomlevelinparentheses.*Significantatthe5%level,**significantatthe1%level.Note:2012,2013,and2014resultsarerestrictedtogrades15,14,and13,respectively

  • 26

    Table8B:EstimatedEffectofBeingCheatedin2008/09onNormedReadingTestScoresin2010/112013/2014by2008/09GradeLevel

    Grades1&2 Grades36 Grades16

    Model 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14

    NoFixedEffects

    0.1419**(0.0269)

    0.1955**(0.0323)

    0.1858**(0.0371)

    0.1863**(0.0401)

    0.0701**(0.0169)

    0.0762**(0.0196)

    0.0491*(0.0239)

    0.0374(0.0373)

    0.0798**(0.0142)

    0.0936**(0.0165)

    0.0612**(0.0205)

    0.1076**(0.0268)

    SchoolFixedEffects

    0.1449**(0.0273)

    0.1919**(0.0315)

    0.1775**(0.0370)

    0.1705**(0.0399)

    0.0681**(0.0166)

    0.0685**(0.0193)

    0.0440(0.0235)

    0.0355(0.0365)

    0.0782**(0.0142)

    0.0978**(0.0164)

    0.0633**(0.0204)

    0.0932**(0.0268)

    ClassroomFixedEffects

    0.1888**(0.0337)

    0.2195**(0.0377)

    0.2536**(0.0462)

    0.2164**(0.0515)

    0.0977**(0.0229)

    0.1045**(0.0266)

    0.0664*(0.0315)

    0.0326(0.0527)

    0.1263**(0.0190)

    0.1409**(0.0217)

    0.1386**(0.0271)

    0.1334**(0.0370)

    Standarderrorsclusteredattheclassroomlevelinparentheses.*Significantatthe5%level,**significantatthe1%level.Note:2012,2013,and2014resultsarerestrictedtogrades15,14,and13,respectively

  • 27

    Table8C:EstimatedEffectofBeingCheatedin2008/09onNormedELATestScoresin2010/112013/2014by2008/09GradeLevel

    Grades1&2 Grades36 Grades16

    Model 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14 2010/11 2011/12 2012/13 2013/14

    NoFixedEffects

    0.1119**(0.0210)

    0.1053**(0.0252)

    0.1078**(0.0260)

    0.1326**(0.0258)

    0.0380*(0.0167)

    0.0707**(0.0186)

    0.0365(0.0236)

    0.0578(0.0338)

    0.0668**(0.0131)

    0.0887**(0.0150)

    0.0664**(0.0176)

    0.1145**(0.0204)

    SchoolFixedEffects

    0.1051**(0.0205)

    0.1013**(0.0241)

    0.0984**(0.0250)

    0.1224**(0.0252)

    0.0406**(0.0159)

    0.0669**(0.0184)

    0.0260(0.0233)

    0.0644(0.0335)

    0.0645**(0.0127)

    0.0828**(0.0145)

    0.0603**(0.0173)

    0.1090**(0.0201)

    ClassroomFixedEffects

    0.1421**(0.0268)

    0.1191**(0.0300)

    0.1407**(0.0351)

    0.1503**(0.0316)

    0.0426*(0.0210)

    0.0862**(0.0236)

    0.0319(0.0309)

    0.0567(0.0425)

    0.0828**(0.0165)

    0.1020**(0.0185)

    0.0824**(0.0237)

    0.1190**(0.0253)

    Standarderrorsclusteredattheclassroomlevelinparentheses.*Significantatthe5%level,**significantatthe1%level.Note:2012,2013,and2014resultsarerestrictedtogrades15,14,and13,respectively

  • 28

    Table9.EstimatedEffectofBeingCheatedin2008/09onPercentAttendance2010/112013/2014

    Model 2010/11 2011/12 2012/13 2013/14

    NoFixedEffects 0.0781(0.1124)0.2632(0.1835)

    0.3145(0.2299)

    0.7764**(0.2627)

    SchoolFixedEffects 0.0021(0.0916)0.4200**(0.1482)

    0.3878*(0.1959)

    0.6880**(0.2468)

    ClassroomFixedEffects 0.0200(0.1045)0.0781(0.1816)

    0.0603(0.2166)

    0.2992(0.2850)

    Standarderrorsclusteredattheclassroomlevelinparentheses.*Significantatthe5%level,**significantatthe1%level.

    Table10.EstimatedEffectofBeingCheatedin2008/09onDisciplineIncidents2010/112013/2014

    Model 2010/11 2011/12 2012/13 2013/14

    NoFixedEffects 0.0070(0.0408)

    0.0098(0.0634)

    0.1149*(0.0549)

    0.0677(0.0539)

    SchoolFixedEffects 0.0138(0.0382)

    0.0847(0.0597)

    0.0038(0.0561)

    0.0441(0.0583)

    ClassroomFixedEffects 0.0054(0.0463)

    0.0629(0.0770)

    0.0505(0.0733)

    0.0232(0.0749)

    Standarderrorsclusteredattheclassroomlevelinparentheses.*Significantatthe5%level,**significantatthe1%level.