common%binding%by%redundant%group%b% sox$proteins$is ... · 3!!...

63
Common binding by redundant group B Sox proteins is evolutionarily conserved in Drosophila Sarah H. Carl and Steven Russell Department of Genetics and Cambridge Systems Biology Centre University of Cambridge Downing Street Cambridge CB2 3EH UK Sarah H. Carl <[email protected] > Steven Russell <[email protected] > . CC-BY-ND 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted December 17, 2014. . https://doi.org/10.1101/012872 doi: bioRxiv preprint

Upload: others

Post on 21-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

CommonbindingbyredundantgroupBSox$proteins$is$evolutionarily$conserved$in$Drosophila

SarahHCarlandStevenRussell

DepartmentofGeneticsandCambridgeSystemsBiologyCentre

UniversityofCambridge

DowningStreet

Cambridge

CB23EH

UK

SarahHCarlltscarlgencamacukgt

StevenRussellltsrussellgencamacukgt

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

1

Abstract

Background

GroupBSoxproteinsareahighlyconservedgroupoftranscriptionfactorsthatactextensivelyto

coordinatenervoussystemdevelopmentinhighermetazoanswhileshowingbothcoKexpressionand

functionalredundancyacrossabroadgroupoftaxaInDrosophilamelanogasterthetwogroupB

SoxproteinsDichaeteandSoxNeuroshowwidespreadcommonbindingacrossthegenomeWhile

someinstancesoffunctionalcompensationhavebeenobservedinDrosophilathefunctionof

commonbindingandtheextentofitsevolutionaryconservationisnotknown

Results

WeusedDamIDKseqtoexaminethegenomeKwidebindingpatternsofDichaeteandSoxNeuroin

fourspeciesofDrosophilaThroughaquantitativecomparisonofDichaetebindingweevaluatedthe

rateofbindingsiteturnoveracrossthegenomeaswellasatspecificfunctionalsitesWealso

examinedthepresenceofSoxmotifswithinbindingintervalsandthecorrelationbetweensequence

conservationandbindingconservationTodeterminewhethercommonbindingbetweenDichaete

andSoxNeuroisconservedweperformedadetailedanalysisofthebindingpatternsofbothfactors

intwospecies

Conclusion

WefindthatwhiletheregulatorynetworksdrivenbyDichaeteandSoxNeuroarelargelyconserved

acrossthedrosophilidsstudiedbindingsiteturnoveriswidespreadandcorrelatedwith

phylogeneticdistanceNonethelessbindingispreferentiallyconservedatknowncis8regulatory

modulesandcoreindependentlyverifiedbindingsitesWeobservedthestrongestbinding

conservationatsitesthatarecommonlyboundbyDichaeteandSoxNeurosuggestingthatthese

sitesarefunctionallyimportantOuranalysisprovidesinsightsintotheevolutionofgroupBSox

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

2

functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative

sourcesofneofunctionalisationbetweenparalogousfamilymembers

KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding

conservation

Background

TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely

recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory

elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant

progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors

(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases

thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent

andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe

mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems

ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof

TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally

importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative

evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup

BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand

displaysevidenceoffunctionalredundancy

Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve

asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction

withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

3

havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox

proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided

intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]

Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular

tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose

tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)

andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF

genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly

aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy

whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup

areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups

invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit

flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour

genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]

GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost

closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily

andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB

SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural

inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox

genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup

B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe

developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto

conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes

primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although

functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects

[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

4

DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely

[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro

(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN

WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill

debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates

andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween

thesetwosetsofSoxgenes[3132455052ndash60]

AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping

patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso

showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland

midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline

[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued

withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects

thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand

intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost

strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand

expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly

similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly

boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera

hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)

andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral

patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel

(Kr)andPOUdomainprotein2(pdm2)[51676970]

NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof

compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 2: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

1

Abstract

Background

GroupBSoxproteinsareahighlyconservedgroupoftranscriptionfactorsthatactextensivelyto

coordinatenervoussystemdevelopmentinhighermetazoanswhileshowingbothcoKexpressionand

functionalredundancyacrossabroadgroupoftaxaInDrosophilamelanogasterthetwogroupB

SoxproteinsDichaeteandSoxNeuroshowwidespreadcommonbindingacrossthegenomeWhile

someinstancesoffunctionalcompensationhavebeenobservedinDrosophilathefunctionof

commonbindingandtheextentofitsevolutionaryconservationisnotknown

Results

WeusedDamIDKseqtoexaminethegenomeKwidebindingpatternsofDichaeteandSoxNeuroin

fourspeciesofDrosophilaThroughaquantitativecomparisonofDichaetebindingweevaluatedthe

rateofbindingsiteturnoveracrossthegenomeaswellasatspecificfunctionalsitesWealso

examinedthepresenceofSoxmotifswithinbindingintervalsandthecorrelationbetweensequence

conservationandbindingconservationTodeterminewhethercommonbindingbetweenDichaete

andSoxNeuroisconservedweperformedadetailedanalysisofthebindingpatternsofbothfactors

intwospecies

Conclusion

WefindthatwhiletheregulatorynetworksdrivenbyDichaeteandSoxNeuroarelargelyconserved

acrossthedrosophilidsstudiedbindingsiteturnoveriswidespreadandcorrelatedwith

phylogeneticdistanceNonethelessbindingispreferentiallyconservedatknowncis8regulatory

modulesandcoreindependentlyverifiedbindingsitesWeobservedthestrongestbinding

conservationatsitesthatarecommonlyboundbyDichaeteandSoxNeurosuggestingthatthese

sitesarefunctionallyimportantOuranalysisprovidesinsightsintotheevolutionofgroupBSox

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

2

functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative

sourcesofneofunctionalisationbetweenparalogousfamilymembers

KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding

conservation

Background

TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely

recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory

elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant

progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors

(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases

thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent

andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe

mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems

ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof

TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally

importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative

evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup

BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand

displaysevidenceoffunctionalredundancy

Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve

asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction

withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

3

havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox

proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided

intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]

Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular

tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose

tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)

andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF

genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly

aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy

whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup

areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups

invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit

flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour

genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]

GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost

closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily

andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB

SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural

inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox

genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup

B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe

developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto

conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes

primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although

functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects

[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

4

DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely

[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro

(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN

WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill

debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates

andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween

thesetwosetsofSoxgenes[3132455052ndash60]

AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping

patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso

showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland

midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline

[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued

withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects

thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand

intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost

strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand

expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly

similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly

boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera

hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)

andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral

patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel

(Kr)andPOUdomainprotein2(pdm2)[51676970]

NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof

compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 3: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

2

functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative

sourcesofneofunctionalisationbetweenparalogousfamilymembers

KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding

conservation

Background

TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely

recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory

elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant

progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors

(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases

thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent

andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe

mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems

ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof

TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally

importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative

evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup

BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand

displaysevidenceoffunctionalredundancy

Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve

asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction

withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

3

havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox

proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided

intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]

Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular

tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose

tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)

andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF

genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly

aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy

whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup

areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups

invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit

flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour

genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]

GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost

closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily

andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB

SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural

inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox

genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup

B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe

developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto

conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes

primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although

functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects

[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

4

DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely

[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro

(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN

WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill

debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates

andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween

thesetwosetsofSoxgenes[3132455052ndash60]

AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping

patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso

showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland

midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline

[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued

withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects

thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand

intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost

strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand

expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly

similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly

boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera

hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)

andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral

patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel

(Kr)andPOUdomainprotein2(pdm2)[51676970]

NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof

compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 4: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

3

havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox

proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided

intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]

Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular

tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose

tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)

andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF

genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly

aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy

whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup

areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups

invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit

flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour

genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]

GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost

closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily

andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB

SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural

inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox

genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup

B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe

developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto

conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes

primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although

functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects

[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

4

DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely

[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro

(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN

WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill

debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates

andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween

thesetwosetsofSoxgenes[3132455052ndash60]

AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping

patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso

showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland

midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline

[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued

withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects

thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand

intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost

strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand

expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly

similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly

boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera

hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)

andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral

patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel

(Kr)andPOUdomainprotein2(pdm2)[51676970]

NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof

compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 5: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

4

DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely

[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro

(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN

WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill

debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates

andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween

thesetwosetsofSoxgenes[3132455052ndash60]

AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping

patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso

showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland

midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline

[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued

withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects

thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand

intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost

strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand

expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly

similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly

boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera

hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)

andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral

patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel

(Kr)andPOUdomainprotein2(pdm2)[51676970]

NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof

compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 6: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

5

andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits

placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto

resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances

DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone

anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot

affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe

resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles

whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess

ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding

patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat

observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame

Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat

functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross

manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman

evolutionaryperspective

Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary

dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological

andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous

comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave

focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid

(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal

regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites

duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof

divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF

bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial

bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 7: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

6

77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis

generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes

DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand

formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy

theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD

simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland

evolutionarydynamicsofgroupBSoxbinding

Results

Comparative+DamIDseq+for+group+B+Sox+proteins+

WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh

confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin

Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional

compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD

melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin

fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof

bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates

ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment

betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda

differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly

boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand

mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding

intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 8: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

7

viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth

DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51

67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween

17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD

yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding

intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere

noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological

replicates

TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome

coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences

betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure

2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN

bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]

Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated

witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately

9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD

simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete

andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000

geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD

pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously

identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof

DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology

BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof

transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem

development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough

thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 9: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

8

enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare

highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional

network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso

usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto

transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar

distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe

remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions

Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach

originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe

knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6

motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin

eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand

SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs

identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN

(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa

strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach

speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand

SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe

samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset

ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown

interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation

factorsKnirps(Kni)andRunt(Run)

FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory

modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand

FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 10: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

9

DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof

CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas

foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed

slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs

AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin

manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof

DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese

observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge

setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor

intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot

changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere

Turnover+of+Dichaete+binding+sites+during+evolution+

InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution

wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD

yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread

turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding

intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone

specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)

species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe

threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom

092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=

064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach

speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 11: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

10

inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila

phylogeny

WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam

bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat

showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons

betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof

preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880

differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while

only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof

quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows

thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock

mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila

[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies

butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent

inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand

Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein

thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein

anotherspeciesfrom96inDsimulansto204inDyakuba

Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat

orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe

samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto

asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents

weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany

intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare

annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 12: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

11

positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite

turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster

andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat

therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding

sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction

withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq

assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD

melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto

takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand

identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation

relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster

Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs

thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC

compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD

yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding

intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests

thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare

functionallyandpositionallyconservedinbothspecies

Binding+conservation+and+regulatory+function

FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe

assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas

knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs

inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator

Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 13: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

12

85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved

inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table

3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile

276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults

werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM

hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p

=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway

conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat

CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=

3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates

ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare

likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent

withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]

OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand

highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat

DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink

betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat

coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified

inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals

dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore

intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086

df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For

bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore

bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays

thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe

functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 14: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

13

directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore

interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)

(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare

identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan

functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome

targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene

andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers

andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese

functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof

conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore

intervalswhicharerobustlydetectedbyindependentbindingassays

Sequence+conservation+of+Sox+motifs+and+binding+intervals+

Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding

conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn

ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified

allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame

withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome

[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin

controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon

DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD

pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain

significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD

melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 15: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

14

showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding

conservation(Figure5A)

PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide

conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies

[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa

phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous

sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation

andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof

regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions

thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide

conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD

melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater

sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)

WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand

calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast

tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation

showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound

inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly

shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand

searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage

ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals

thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin

intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol

motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 16: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

15

WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous

locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs

inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto

16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference

(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally

conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)

Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow

fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD

melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank

sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare

boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose

thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely

toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID

bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround

thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP

intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation

andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell

asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain

functionalbindingevents

High+conservation+of+common+binding+by+Dichaete+and+SoxN+

HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch

asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould

yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave

shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 17: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

16

andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]

HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila

evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese

questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD

simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive

commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat

surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD

simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere

thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)

Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals

thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan

intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD

melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin

DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD

simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant

differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])

PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore

strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD

melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound

intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger

thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore

intervals

Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe

identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans

(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 18: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

17

SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP

termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation

ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests

thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof

genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding

isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan

expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins

Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese

proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist

of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof

Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved

orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis

consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof

DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound

tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete

andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute

equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart

oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere

isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand

commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary

conservationofSoxfunctionsintheCNS

WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso

identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to

analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD

simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand

SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 19: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

18

Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen

whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof

thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN

inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand

thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential

Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381

uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional

Files6and7)

AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome

propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo

setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto

morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8

and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique

andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential

targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant

pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals

uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron

morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS

developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith

orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our

resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand

Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe

brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously

beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63

67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds

toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 20: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

19

resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey

differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression

patternsasignatureofneofunctionalisation

OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina

comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox

functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite

turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite

turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith

phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites

whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals

inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals

wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin

twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby

bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound

conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN

functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant

bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution

Discussion

InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand

SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin

multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall

conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso

uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified

highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 21: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

20

conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon

bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding

patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon

bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor

alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof

theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare

conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3

whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe

possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers

WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD

melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether

allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan

underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe

foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing

comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound

regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese

correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding

profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso

identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent

compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis

modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise

comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies

hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother

speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed

turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively

stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 22: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

21

CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof

CRMsaltogetherwhichhighlightstheirfunctionalrole

IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution

thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding

intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins

containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding

shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach

speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq

havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding

intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant

correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding

intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly

chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional

bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved

bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand

thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs

withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies

thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot

determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional

bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults

suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved

correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation

OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir

extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns

Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 23: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

22

compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary

significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential

conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo

speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof

divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby

substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding

divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin

vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall

highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared

touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD

melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding

patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences

Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding

intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear

environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup

BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in

theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap

withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand

thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith

eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon

bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand

SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare

conservedfrominsectstovertebrates

ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand

manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 24: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

23

commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof

earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand

SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional

compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective

(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome

oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or

prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly

regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially

alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby

bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily

performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta

complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth

functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare

evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand

SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat

canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith

thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins

AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso

explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe

majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe

HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos

regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey

showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso

expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain

andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina

specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 25: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

24

genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial

expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese

domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot

beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe

HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral

veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK

specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar

interaction

UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely

specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators

atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn

additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin

specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB

Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe

broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof

variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch

partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo

competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication

eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras

theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare

debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen

retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor

acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat

theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas

beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent

lineages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 26: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

25

Conclusions

ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox

proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe

patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete

andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory

networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof

otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites

includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly

boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat

alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey

representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution

wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual

paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor

bindingpartners

Methods

Flyhusbandryandembryocollection

ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD

melanogasterw1118Dsimulansw[501](referencestrainK

httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115

[111]Dpseudoobscurapseudoobscura(referencestrainK

httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD

simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 27: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

26

stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g

yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia

powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented

withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith

waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC

Generationoftransgeniclines

ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam

constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith

upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST

vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan

Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD

melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS

sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector

(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle

vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)

TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand

clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas

microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha

piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe

DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere

backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin

femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP

expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 28: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

27

IsolationofDamIDDNAandsequencing

EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere

collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof

Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof

approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight

genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml

ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle

AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet

wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe

supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer

300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse

thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen

purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas

precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin

50K150microlTEbufferdependingonthestartingamountofembryos

30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved

contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads

(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes

ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich

wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf

bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or

14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC

sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe

DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe

averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 29: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

28

10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit

toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh

SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured

usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto

BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary

protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2

samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas

150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads

Sequencingdataanalysis

Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads

weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault

settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC

droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005

(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC

dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All

referencegenomesweredownloadedfromtheUCSCGenomeBrowser

(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing

SAMtools[119]

ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe

HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach

samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop

utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing

theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma

counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 30: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

29

eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using

RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein

samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas

differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted

andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals

BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl

[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK

DamIDKanalysisKpipeline

BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD

melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome

Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe

minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs

werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID

datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK

convertedtoSAMformatusingacustomperlscript(bed2sampl

httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat

usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized

togetherinDiffBindusingtheDESeq2normalizationmethod

BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl

scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere

performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand

plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere

performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone

usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 31: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

30

FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor

genelengtheffect

Evolutionarysequenceanalysisofbindingmotifs

DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster

andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD

melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly

usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding

intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving

strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould

beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe

guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites

withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring

denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]

FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits

wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso

usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK

DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground

fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt

00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding

intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose

speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]

andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098

Dataaccess

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 32: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

31

AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene

ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333

(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)

Competinginterests

Theauthorsdeclarethattheyhavenocompetinginterests

Authorscontributions

SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe

dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript

Descriptionofadditionalfiles

SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)

MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba

andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD

simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare

highlightedinred

SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals

Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand

intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures

(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)

PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)

PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 33: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

32

ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba

DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura

DichaeteKDamintervalsannotatedtoeachfeatureclass

SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved

(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway

bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless

likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)

SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding

conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe

uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals

thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless

likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding

intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)

SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa

directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa

coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD

melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD

melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding

intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation

betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD

melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare

annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare

lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect

isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 34: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

33

targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD

melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor

coreintervals

Additionalfile1

Fileformatxls

TitleGenesannotatedtoSoxDamIDbindingintervals

DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall

speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled

fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom

NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile2

Fileformatxls

TitleGeneontologytermsenrichedinSoxtargetgenelists

DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes

inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile3

Fileformatxls

TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals

DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde

novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 35: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

34

columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe

consensussequenceofthemotifandthefourthcolumnlistsitspKvalue

Additionalfile4

Fileformatxls

TitleTargetgenesannotatedtoconservedcommonlyboundintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby

DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers

fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted

Additionalfile5

Fileformatxls

TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD

simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe

correctedpKvalue

Additionalfile6

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete

andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols

andFlybaseidentifiersforeachgenearelisted

Additionalfile7

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 36: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

35

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe

firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Additionalfile8

Fileformatxls

TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals

DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand

conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand

Flybaseidentifiersforeachgenearelisted

Additionalfile9

Fileformatxls

TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes

DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat

areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst

columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue

Acknowledgements

ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas

TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis

decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 37: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

36

pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank

ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac

transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto

BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis

References

1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74

2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432

3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90

4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797

5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216

6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626

7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979

8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392

9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 38: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

37

10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34

11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101

12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70

13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421

14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614

15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335

16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223

17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506

18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330

19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567

20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014

21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477

22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667

23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173

24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464

25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 39: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

38

GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960

26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255

27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018

28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170

29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464

30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532

31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36

32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201

33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799

34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644

35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409

36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324

37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526

38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12

39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 40: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

39

40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781

41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936

42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255

43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588

44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520

45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626

46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937

47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428

48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120

49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771

50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996

51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74

52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329

53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440

54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013

55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 41: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

40

56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605

57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635

58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846

59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120

60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570

61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203

62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542

63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321

64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131

65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003

66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228

67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861

68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115

69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511

70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36

71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 42: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

41

72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266

73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373

74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665

75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556

76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343

77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420

78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80

79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748

80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732

81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040

82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540

83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014

84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123

85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 43: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

42

ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013

86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012

87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364

88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567

89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018

90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842

91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635

92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562

93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91

94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588

95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478

96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818

97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835

98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150

99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676

100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 44: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

43

101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218

102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232

103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488

104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188

105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497

106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014

107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676

108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813

109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014

110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686

111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26

112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009

113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637

114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10

115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 45: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

44

116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195

117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079

118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18

119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079

120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589

121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014

122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61

123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014

124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237

125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731

126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129

127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 46: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

45

Figurelegends

Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach

speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam

fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach

specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe

dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof

fliesarefromNGompel(httpwwwibdmlunivK

mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel

DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila

pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora

DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof

bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith

DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof

adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom

bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe

genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding

profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds

representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)

Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles

plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion

infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere

scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom

0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 47: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

46

translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom

eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor

visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof

chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding

intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding

profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly

controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding

intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe

fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies

showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans

yakDrosophilayakubapseDrosophilapseudoobscura

Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall

Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall

threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD

melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread

counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation

coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher

correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological

replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven

betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity

scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal

component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal

component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 48: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

47

Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin

DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat

arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange

lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster

andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe

distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach

sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen

correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare

preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD

melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD

melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows

differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby

bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof

intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare

identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and

Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2

foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment

BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved

arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba

thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst

andfourthintrons

Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding

conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies

(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 49: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

48

meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour

speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals

thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour

species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies

thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)

randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=

162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD

melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave

moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross

allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple

alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation

(highlightedinpurple)

Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram

showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe

singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals

(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD

melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe

differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK

16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot

showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and

SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences

inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo

species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein

bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 50: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

49

pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF

criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals

Tables

Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals

A Bindingintervals(plt001)

Bindingintervals(plt005)

DmelDKDam 17530 20848 DmelSoxNK

Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK

Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK

Dam NA 2951

B Translatedbindingintervals

OverlapswithD)melbindingintervals

ofD)melintervalsoverlapping

ofnonVD)melintervalsoverlapping

DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644

C Genesannotatedtobindingintervals

GenescommontoDichaeteandSoxN

Directtargetsannotatedtobindingintervals

DmelDKDam 9400 844511731373(854)

DmelSoxNKDam 9528 8445 434544(800)

DsimDKDam 9383 7524

11111373(809)

DsimSoxNKDam 8948 7524 412544(757)

DyakDKDam 12192 NA

12491373(910)

DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 51: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

50

Dataset CRMsindataset

CRMsoverlappingDichaetebinding

CRMsoverlappingSoxNbinding

CRMsoverlappingDichaeteandSoxNbinding

REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)

Table3ConservationofSoxbindingintervalsatfunctionalsites

Factor Condition Percentofbindingintervalsconserved

PercentofbindingintervalsuniquetoD)mel

Dichaete REDFly 644 96

NotREDFly 448 276

SoxN REDFly 790 210

NotREDFly 465 535

Dichaete FlyLight 547 167

NotFlyLight 442 286

SoxN FlyLight 653 347

NotFlyLight 451 549

Dichaete Coreinterval 704 77

Notcoreinterval 385 324

SoxN Coreinterval 757 243

Notcoreinterval 447 553

Dichaete Directtarget 537 204

Notdirecttarget 431 290

SoxN Directtarget 533 476

Notdirecttarget 473 527

Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN

Species BindingpatternPercentofbindingintervalsconserved

Percentofbindingintervalsnotconserved

Dmelanogaster Common 765 235

Dichaeteunique 401 599

SoxNunique 337 663

Dsimulans Common 941 59

Dichaeteunique 628 372

SoxNunique 701 299

Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 52: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

51

Mouseprotein

OrthologuesofmousetargetsinD)melanogaster

OverlapwithcommonDichaeteSoxNtargets

OverlapwithcoreSoxNtargets

OverlapwithcoreDichaetetargets

Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 53: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

Figure 1

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 54: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

dpp

Figure 2

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 55: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

Figure 3

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 56: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

Figure 4

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 57: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

Figure 5

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 58: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

ʹ

ʹ

Figure 6

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 59: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

Additional files provided with this submission

Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 60: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

A

B

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

D melanogasterD simulans

D yakubaD pseudoobscura

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 61: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

A

B

C

D

E

F

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 62: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files
Page 63: Common%binding%by%redundant%group%B% Sox$proteins$is ... · 3!! have!been!searched!for,!including!basal!members!such!as!sponges!and!placozoa![21–25].!All!Sox! proteins!contain!a!single!HMG!(highKmobility!group)!DNA!binding

A

B

C

D

CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint

  • Start of article
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Additional files