common%binding%by%redundant%group%b% sox$proteins$is ... · 3!!...
TRANSCRIPT
CommonbindingbyredundantgroupBSox$proteins$is$evolutionarily$conserved$in$Drosophila
SarahHCarlandStevenRussell
DepartmentofGeneticsandCambridgeSystemsBiologyCentre
UniversityofCambridge
DowningStreet
Cambridge
CB23EH
UK
SarahHCarlltscarlgencamacukgt
StevenRussellltsrussellgencamacukgt
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
1
Abstract
Background
GroupBSoxproteinsareahighlyconservedgroupoftranscriptionfactorsthatactextensivelyto
coordinatenervoussystemdevelopmentinhighermetazoanswhileshowingbothcoKexpressionand
functionalredundancyacrossabroadgroupoftaxaInDrosophilamelanogasterthetwogroupB
SoxproteinsDichaeteandSoxNeuroshowwidespreadcommonbindingacrossthegenomeWhile
someinstancesoffunctionalcompensationhavebeenobservedinDrosophilathefunctionof
commonbindingandtheextentofitsevolutionaryconservationisnotknown
Results
WeusedDamIDKseqtoexaminethegenomeKwidebindingpatternsofDichaeteandSoxNeuroin
fourspeciesofDrosophilaThroughaquantitativecomparisonofDichaetebindingweevaluatedthe
rateofbindingsiteturnoveracrossthegenomeaswellasatspecificfunctionalsitesWealso
examinedthepresenceofSoxmotifswithinbindingintervalsandthecorrelationbetweensequence
conservationandbindingconservationTodeterminewhethercommonbindingbetweenDichaete
andSoxNeuroisconservedweperformedadetailedanalysisofthebindingpatternsofbothfactors
intwospecies
Conclusion
WefindthatwhiletheregulatorynetworksdrivenbyDichaeteandSoxNeuroarelargelyconserved
acrossthedrosophilidsstudiedbindingsiteturnoveriswidespreadandcorrelatedwith
phylogeneticdistanceNonethelessbindingispreferentiallyconservedatknowncis8regulatory
modulesandcoreindependentlyverifiedbindingsitesWeobservedthestrongestbinding
conservationatsitesthatarecommonlyboundbyDichaeteandSoxNeurosuggestingthatthese
sitesarefunctionallyimportantOuranalysisprovidesinsightsintotheevolutionofgroupBSox
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
2
functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative
sourcesofneofunctionalisationbetweenparalogousfamilymembers
KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding
conservation
Background
TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely
recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory
elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant
progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors
(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases
thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent
andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe
mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems
ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof
TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally
importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative
evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup
BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand
displaysevidenceoffunctionalredundancy
Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve
asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction
withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
3
havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox
proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided
intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]
Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular
tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose
tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)
andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF
genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly
aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy
whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup
areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups
invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit
flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour
genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]
GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost
closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily
andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB
SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural
inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox
genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup
B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe
developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto
conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes
primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although
functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects
[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
4
DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely
[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro
(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN
WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill
debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates
andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween
thesetwosetsofSoxgenes[3132455052ndash60]
AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping
patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso
showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland
midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline
[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued
withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects
thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand
intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost
strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand
expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly
similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly
boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera
hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)
andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral
patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel
(Kr)andPOUdomainprotein2(pdm2)[51676970]
NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof
compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
1
Abstract
Background
GroupBSoxproteinsareahighlyconservedgroupoftranscriptionfactorsthatactextensivelyto
coordinatenervoussystemdevelopmentinhighermetazoanswhileshowingbothcoKexpressionand
functionalredundancyacrossabroadgroupoftaxaInDrosophilamelanogasterthetwogroupB
SoxproteinsDichaeteandSoxNeuroshowwidespreadcommonbindingacrossthegenomeWhile
someinstancesoffunctionalcompensationhavebeenobservedinDrosophilathefunctionof
commonbindingandtheextentofitsevolutionaryconservationisnotknown
Results
WeusedDamIDKseqtoexaminethegenomeKwidebindingpatternsofDichaeteandSoxNeuroin
fourspeciesofDrosophilaThroughaquantitativecomparisonofDichaetebindingweevaluatedthe
rateofbindingsiteturnoveracrossthegenomeaswellasatspecificfunctionalsitesWealso
examinedthepresenceofSoxmotifswithinbindingintervalsandthecorrelationbetweensequence
conservationandbindingconservationTodeterminewhethercommonbindingbetweenDichaete
andSoxNeuroisconservedweperformedadetailedanalysisofthebindingpatternsofbothfactors
intwospecies
Conclusion
WefindthatwhiletheregulatorynetworksdrivenbyDichaeteandSoxNeuroarelargelyconserved
acrossthedrosophilidsstudiedbindingsiteturnoveriswidespreadandcorrelatedwith
phylogeneticdistanceNonethelessbindingispreferentiallyconservedatknowncis8regulatory
modulesandcoreindependentlyverifiedbindingsitesWeobservedthestrongestbinding
conservationatsitesthatarecommonlyboundbyDichaeteandSoxNeurosuggestingthatthese
sitesarefunctionallyimportantOuranalysisprovidesinsightsintotheevolutionofgroupBSox
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
2
functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative
sourcesofneofunctionalisationbetweenparalogousfamilymembers
KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding
conservation
Background
TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely
recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory
elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant
progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors
(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases
thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent
andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe
mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems
ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof
TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally
importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative
evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup
BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand
displaysevidenceoffunctionalredundancy
Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve
asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction
withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
3
havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox
proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided
intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]
Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular
tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose
tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)
andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF
genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly
aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy
whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup
areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups
invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit
flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour
genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]
GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost
closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily
andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB
SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural
inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox
genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup
B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe
developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto
conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes
primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although
functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects
[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
4
DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely
[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro
(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN
WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill
debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates
andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween
thesetwosetsofSoxgenes[3132455052ndash60]
AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping
patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso
showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland
midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline
[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued
withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects
thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand
intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost
strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand
expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly
similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly
boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera
hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)
andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral
patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel
(Kr)andPOUdomainprotein2(pdm2)[51676970]
NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof
compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
2
functionhighlightingthespecificconservationofsharedbindingsitesandsuggestingalternative
sourcesofneofunctionalisationbetweenparalogousfamilymembers
KeywordsSoxDichaeteSoxNeuroDrosophilaredundancytranscriptionfactorbinding
conservation
Background
TheimportanceofregulatoryDNAindevelopmentevolutionanddiseasehasbecomewidely
recognizedinrecentyearsasthenumberoffunctionalgenomicstudiesmappingregulatory
elementsinthegenomesofhumansandmodelorganismshasgrown[1ndash5]Whilesignificant
progresshasbeenmadeonquestionssuchastheinvivobindingspecificityoftranscriptionfactors
(TFs)andthecombinatorialpatternsofTFbindingthatdrivespecificgeneexpressioninmanycases
thelargenumberofbindingeventsobservedinvivosuggeststhatTFfunctioniscontextKdependent
andinfluencedbyfactorsinadditiontotheirintrinsicsequencespecificity[6ndash12]Furthermorethe
mostcommonmethodsofassayinginvivogenomeKwidebindingarenoisyandmaysufferproblems
ofbiasandfalsepositives[13ndash17]Onewaytocutthroughthisnoiseandassessthefunctionalityof
TFbindingistoleveragetheeffectofnaturalselectionwhichshouldtendtopreservefunctionally
importantbindingeventsduringevolution[618ndash20]Herewehaveusedacomparative
evolutionaryapproachinfourspeciesofDrosophilatostudythebindingandfunctionsoftwogroup
BSoxproteinsafamilyofTFsthatisbothdeeplyconservedthroughoutanimalevolutionand
displaysevidenceoffunctionalredundancy
Sox(SRYKrelatedhighKmobilitygroupbox)genesencodeahighlyconservedfamilyofTFsthatserve
asbroaddevelopmentalregulatorsinmetazoaTheyarethoughttohaveevolvedinconjunction
withtheoriginofmulticellularanimallifeastheyarepresentinallanimalgenomesinwhichthey
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
3
havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox
proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided
intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]
Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular
tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose
tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)
andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF
genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly
aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy
whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup
areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups
invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit
flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour
genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]
GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost
closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily
andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB
SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural
inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox
genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup
B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe
developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto
conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes
primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although
functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects
[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
4
DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely
[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro
(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN
WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill
debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates
andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween
thesetwosetsofSoxgenes[3132455052ndash60]
AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping
patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso
showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland
midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline
[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued
withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects
thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand
intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost
strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand
expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly
similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly
boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera
hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)
andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral
patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel
(Kr)andPOUdomainprotein2(pdm2)[51676970]
NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof
compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
3
havebeensearchedforincludingbasalmemberssuchasspongesandplacozoa[21ndash25]AllSox
proteinscontainasingleHMG(highKmobilitygroup)DNAbindingdomainandtheyaresubdivided
intotengroupsAthroughJbasedonHMGsequenceandfullKlengthproteinstructure[2426ndash28]
Invertebratesmembersofeachgroupareoftenexpressedinoverlappingpatternsinparticular
tissuesduringdevelopmentandplayimportantrolesindirectingthedifferentiationofcellsinthose
tissuesForexamplegroupBgenesareexpressedinthedevelopingcentralnervoussystem(CNS)
andeye[29ndash32]groupCgenesareexpressedinthekidneyandpancreas[33ndash35]andgroupF
genesareexpressedinthedevelopingvascularandlymphaticsystem[3637]Interestinglynotonly
aregroupmemberscoKexpressedinmanycasestheyshowapatternofphenotypicredundancy
whereseveredevelopmentaldefectsareonlyobservedwhenmultiplemembersofthesamegroup
areremoved[2737ndash43]Whilemammaliangenomescontainmultipleparaloguesformostgroups
invertebratestypicallyhavefewerSoxgenesSequencedinsectgenomesincludingthatofthefruit
flyDrosophilamelanogastertypicallycontainonegeneineachofgroupsCDEandFandfour
genesingroupBalthoughoccasionalextragenesareobservedincertainlineages[2426]
GroupBSoxgenesareofparticularevolutionaryinterestbecauseinadditiontobeingthemost
closelyrelatedSoxgenestoSrytheyaresomeofthebestcharacterizedmembersoftheSoxfamily
andappeartohavehighlyconservedfunctionsthroughoutevolution[4445]InmammalsgroupB
SoxgenesareinvolvedinstemcellpluripotencyandselfKrenewalectodermformationneural
inductionCNSdevelopmentplacodeformationandgametogenesis[27]VertebrategroupBSox
genesaredividedintotwosubgroupsgroupB1whichincludesSox1Sox2andSox3[44]andgroup
B2whichincludesSox14andSox21[46]GroupB1andB2genesplayopposingrolesinthe
developingvertebrateCNSwithgroupB1proteinsprimarilyactingastranscriptionalactivatorsto
conveyearlyneuroectodermalcompetenceandmaintainneuralprecursorswhilegroupB2genes
primarilyactastranscriptionalrepressorstopromoteneuronaldifferentiation[4347ndash49]Although
functionaldatasuggeststhattheB1KB2divisionmaynotbeanappropriateclassificationininsects
[455051]aroleforgroupBSoxgenesinneuraldevelopmentappearstobeconservedmaking
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
4
DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely
[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro
(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN
WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill
debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates
andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween
thesetwosetsofSoxgenes[3132455052ndash60]
AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping
patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso
showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland
midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline
[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued
withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects
thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand
intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost
strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand
expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly
similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly
boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera
hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)
andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral
patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel
(Kr)andPOUdomainprotein2(pdm2)[51676970]
NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof
compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
4
DrosophilaanattractivesysteminwhichtostudygroupBSoxfunctionandevolutionmoreclosely
[313243]TherearefourgroupBSoxgenesintheDrosophilamelanogastergenomeSoxNeuro
(SoxN)DichaeteSox21aandSox21bOfthesethebeststudiedtodateareDichaeteandSoxN
WhiletheexactorthologyrelationshipsbetweenvertebrateandinsectgroupBSoxgenesarestill
debatedsimilaritiesintheexpressionpatternsandfunctionsofSox1Sox2andSox3invertebrates
andSoxNandDichaeteininsectssuggestadeeplyconservedyetcomplexrelationshipbetween
thesetwosetsofSoxgenes[3132455052ndash60]
AswithSox1Sox2andSox3invertebratesbothDichaeteandSoxNareexpressedinoverlapping
patternsinthedevelopingDrosophilaCNSandarenecessaryforitsnormaldevelopmenttheyalso
showevidenceoffunctionalredundancy[295561ndash64]Dichaetemutantembryosshowaxonaland
midlinedefectswhichcanberescuedbyexpressingDichaeteormammalianSox2inthemidline
[63]SoxNmutantembryosalsoshowaxonaldefectsandlossoflateralneuronsthatcanberescued
withmammalianSox1[65]SoxNDichaetedoublemutantsshowmuchmoresevereCNSdefects
thaneithersinglemutantinparticularanincreasedlossofneuroblastsinthemedialand
intermediatecolumnsoftheneuroectodermwhereSoxNandDichaeteexpressionoverlapsmost
strongly[6166]Inadditiontocompensationatthelevelofneuralphenotypesinvivobindingand
expressionstudiesofDichaeteandSoxNinDmelanogasterhaveshownthattheyhavehighly
similargenomeKwidebindingpatternsandsharealargenumberoftargetgenes[5167]Commonly
boundtargetscovermanyofthecorefunctionsofbothDichaeteandSoxNincludingovera
hundredotherTFsactiveintheCNStheproneuralgenesoftheachaete8scutecomplexDrop(Dr)
andventralnervoussystemdefective(vnd)whichencodeTFsinvolvedinCNSdorsoKventral
patterning[68]andtheneuroblasttemporalidentitygenesseven8up(svp)hunchback(hb)Kruppel
(Kr)andPOUdomainprotein2(pdm2)[51676970]
NotonlydoDichaeteandSoxNsharemanytargetstheyalsodisplayacomplexpatternof
compensatorybindingineachotherrsquosabsenceStudiesmeasuringSoxNbindinginDichaetemutants
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
5
andviceversaidentifiedlociwhereoneTFcancompensatefortheotherrsquosabsencebybindinginits
placeorincreasingitsownbindingHoweveratotherlocithelossofoneSoxproteinappearsto
resultinalossofbindingbytheotherTheseobservationssuggestthatinsomecircumstances
DichaeteandSoxNcancompensateforoneanotherbutinothercasetheyaredependentonone
anotherFurthermoreatcertainlocitheirbindingisindependentwiththelossofoneTFnot
affectingthebindingoftheotherIthasbeensuggestedthatthiscomplexinterplaymaybethe
resultofpartialneofunctionalisationbetweenthetwoparalogousTFsleadingtotheiruniqueroles
whilemaintainingsomedegreeofredundancyinordertoproviderobustnesstothecriticalprocess
ofearlyneuraldevelopment[5171ndash73]Howeveritisnotknownwhethertheoverlapinbinding
patternsandtargetsobservedbetweenDichaeteandSoxNwhichissubstantiallygreaterthanthat
observedbetweenmanyotherparalogousgenessuchasthecis8paralogousmembersofthesame
Hoxclusters[78117475]hasbeenspecificallymaintainedduringevolutionGiventhefactthat
functionalredundancyamongSoxgenesfromthesamegroupseemstobeacommonthemeacross
manydifferenttaxawewerecurioustoexaminethequestionofgroupBSoxbindingfroman
evolutionaryperspective
Comparativestudiesoftranscriptionfactorbindingcanfacilitateananalysisoftheevolutionary
dynamicsofregulatoryDNAaswelltheuseofnaturalselectionasafiltertoreducethebiological
andtechnicalnoiseinherenttoinvivogenomeKwidebindingassays[61576ndash78]Previous
comparativestudiesinDrosophilahaveprimarilyusedChIPKseqtomeasurebindingandhave
focusedondevelopmentalregulatorssuchastheanteriorKposterior(AP)patterningfactorsBicoid
(Bcd)Hunchback(Hb)Kruppel(Kr)Giant(Gt)Knirps(Kni)andCaudal(Cad)andthemesodermal
regulatorTwist(Twi)[767779]Thesestudiesmeasuredbothquantitativeturnoverofbindingsites
duringevolutionaswellasqualitativechangesinbindingstrengthfindingbroadlysimilarratesof
divergencefordifferentfactorsTheyalsousedtheevolutionarydatatodiscoverfeaturesofTF
bindingthatwereassociatedwithhigherconservationsuchasclusteredbindingcombinatorial
bindingwithotherTFsandthepresenceofspecificsequencemotifsinornearbindingintervals[76
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
6
77]ThedegreeofconservationofbindingeventsbetweendifferentDrosophilaspecieswhichis
generallygreaterthanthatbetweenequallydistantvertebratespecies[2080ndash82]makes
DrosophilaaparticularlywellKsuitedmodelsystemforstudyingtheevolutionofregulatoryDNAand
formakinginterKspeciescomparisonsofTFbindingWiththisinmindweusedDamIDKseqtostudy
theinvivobindingpatternsofDichaeteandSoxNinfourspeciesofDrosophilaDmelanogasterD
simulansDyakubaandDpseudoobscurawiththegoalofsheddingnewlightonthefunctionaland
evolutionarydynamicsofgroupBSoxbinding
Results
Comparative+DamIDseq+for+group+B+Sox+proteins+
WehaverecentlyusedcombinationsofgenomeKwideChIPandDamIDdatatodefinehigh
confidenceinvivobindingprofilesandcorebindingintervalsforbothDichaeteandSoxNin
Drosophilamelanogaster[5167]InordertobetterunderstandaspectsofgroupBSoxfunctional
compensationatagenomiclevelweexpandedonthisworkperformingDamIDusingD
melanogasterSoxKDamfusionproteinsandhighKthroughputsequencingtomapDichaetebindingin
fourDrosophilaspeciesandSoxNbindingintwospecies(Figure1)Theaminoacidsequencesof
bothproteinsarehighlyconservedacrossthefourspeciesthusourexperimentaldesignfacilitates
ananalysisofbindingattributabletodifferencesinthegenomesequenceornuclearenvironment
betweenspeciesratherthantotheSoxproteinsthemselves(SupplementaryFigure1)Weuseda
differentialenrichmentapproachtoidentifyGATCfragmentsineachgenomethatweresignificantly
boundbyeachSoxprotein(DichaeteKDamorSoxNKDam)incomparisontoDamKonlycontrolsand
mergedneighbouringfragmentstogeneratebindingintervals(Table1A)Whilethesebinding
intervalsdonotpreciselycorrespondtoChIPKseqpeaksorDamIDpeakswepreviouslydetermined
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
7
viatilingmicroarraysweneverthelessfoundthatinagreementwithpreviousstudiesboth
DichaeteandSoxNshowedwidespreadbindingatthousandsofsitesacrosseachflygenome[51
67]Afternormalisationandcorrectionformultiplehypothesistestingweidentifiedbetween
17000ndash26000bindingintervals(plt005)forDichaeteinDmelanogasterDsimulansandD
yakubaandforSoxNinDmelanogasterandDsimulansApproximately3000Dichaetebinding
intervalswereidentifiedinDpseudoobscurareflectingthefactthatthebindingprofileswere
noisierinthisspeciescomparedtotheotherthreespecieswithlessreproduciblebiological
replicates
TranslatingsequencereadsfromeachnonKmelanogasterspeciestotheDmelanogastergenome
coordinatesandcallingbindingintervalsonthetranslateddatashowedthatwhiledifferences
betweenspecieswereobservedmanybindingintervalswereconservedacrossthespecies(Figure
2AK2BTable1B)AdditionallyandinagreementwithourpreviousworktheDichaeteandSoxN
bindingprofilesshowedahighdegreeofconcordanceinbothDmelanogasterandDsimulans[51]
Weusedthistranslatedbindingdatatoassessthegenetargetsandgenomicfeaturesassociated
witheachdatasetAssigningeachbindingintervaltoaputativetargetgeneidentifiedapproximately
9000K9500genesassociatedwithbothDichaeteandSoxNbindinginDmelanogasterandD
simulans(Table1CAdditionalFile1)AhighproportionofthesegenesaresharedbetweenDichaete
andSoxNinbothspeciesDichaetebindingintervalsinDyakubawereassociatedwithover12000
geneswhileinDpseudoobscuraweidentified1888associatedgenesWiththeexceptionoftheD
pseudoobscuradataeachsetofputativetargetgenescontainsahighpercentageofpreviously
identifiedDichaeteorSoxNdirecttargetgenes(Table1C)[5167]Inlinewithpreviousstudiesof
DichaeteandSoxNbindingallsetsoftargetgenesarehighlyenrichedforspecificGeneOntology
BiologicalProcess(GOBP)termssuchasgenerationofneurons(plt1eK29)andregulationof
transcription(plt1eK9)[5167]aswellashigherleveltermssuchasgeneralorganandsystem
development(plt1eK47)andbiologicalregulation(plt1eK44)(AdditionalFile2)Notablyalthough
thereweremanyfewerDichaeteboundgenesidentifiedinDpseudoobscuratheseshowstrong
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
8
enrichmentforsimilarGOBPtermsarestronglyupregulatedinthebrainandlarvalCNSandare
highlyassociatedwithpublicationsdescribinggenesinvolvedintheneuralstemcelltranscriptional
network(plt1eK25)allofwhicharehallmarksofknownDichaetefunctions[506467]Wealso
usedthetranslatedbindingdatatodeterminethelocationofbindingintervalswithrespectto
transcriptionstartsitesInbroadagreementwithourpreviouswork[5167]wefoundsimilar
distributionsineachspeciesapproximately65ofbindingintervalsoverlapintronswiththe
remaindermostlymappingintheproximityofpromotersandaminoritywithinintergenicregions
Weperformedsearchesforknownanddenovobindingmotifswiththebindingintervalsineach
originalgenometoidentifyanydifferencesinpreferentialmotifusagebetweenspeciesorTFsThe
knownmotifsearchesuncoveredhighlysignificantmatchestovertebrateSox2Sox3andSox6
motifsDenovosearchesfoundasignificantlyenrichedmotifmatchingtheconsensusSoxmotifin
eachdataset(plt=1eK20)whichcorrespondwelltomotifsdiscoveredinourpreviousDichaeteand
SoxNstudies[5167](AdditionalFile3)OfparticularinterestwefoundthattheSoxmotifs
identifiedintheDichaetebindingintervalsofeachspeciesdifferedfromthosefoundforSoxN
(Figure2D)TheprimarydifferenceisinthefourthpositionofthecoreCAAAGmotifwhichshowsa
strongerpreferenceforTintheDichaeteintervalsandforAintheSoxNKDamintervalsfromeach
speciesAlthoughitisnotknownwhetherthesedifferencesinSoxmotifsfoundinDichaeteand
SoxNbindingintervalsareresponsiblefordifferentfunctionsofthetwoTFsitisstrikingthatthe
samepatternsappearindependentlyinthegenomesofmultiplespeciesInadditionwefoundaset
ofmotifsassociatedwithabroaderarrayofTFsinthecaseofDichaetetheseincludeknown
interactionpartnersVentralveinslacking(Vvl)andNubbin(Nub)aswellastheearlysegmentation
factorsKnirps(Kni)andRunt(Run)
FinallyweexaminedtheoverlapbetweengroupBSoxbindingintervalsandknowncisKregulatory
modules(CRMs)bycomparingourDmelanogasterdatawithannotatedCRMsfromtheREDFlyand
FlyLightdatabasesaswellasarecentSTARRKseqassayidentifyingactiveenhancers[83ndash85]Both
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
9
DichaeteandSoxNshowhighoverlapwithCRMsfromallthreesourceswithahighproportionof
CRMsoverlappingbothDichaeteandSoxNbindingintervals(Table2)Thehighestoverlapwas
foundwiththeREDFlydatabase(~60ofCRMs)whiletheFlyLightandSTARRKseqdatashowed
slightlyloweroverlapinthecaseofFlyLightwefoundhigheroverlapwithCNSspecificCRMs
AlthoughtheintervalsmappedwithDamIDdonotcorresponddirectlytoenhancerelementsin
manycasesvisualinspectionrevealsaverygoodcorrelationbetweenannotatedCRMsandpeaksof
DamIDbinding(egdppFigure2C)Togetherwiththegeneandgenomicfeatureannotationsthese
observationssupporttheemergingviewthatDichaeteandSoxNworktogethertoregulatealarge
setofgenesinvolvedinCNSdevelopmentthroughbindingatknownCRMswithapreferencefor
intronicbinding[5167]andsuggestthattheoverallrolesofthesegroupBSoxproteinshavenot
changedsignificantlyduringtheevolutionofthedrosophilidsanalysedhere
Turnover+of+Dichaete+binding+sites+during+evolution+
InordertoexaminethepatternsofgroupBSoxbindingconservationandturnoverduringevolution
wefocusedouranalysisontheDichaeteKDambindingdatainDmelanogasterDsimulansandD
yakubaWhiletheoverlapintargetgenesbetweenthesethreespeciesishighwefoundwidespread
turnoverintheorthologouslocationsofbindingintervalsIndeedoutofall26117Dichaetebinding
intervalsidentifiedacrossthethreespeciesthemajorfraction(47)arepresentinonlyone
specieswithsmallerproportionspresentatorthologouslocationsintwo(23)orallthree(30)
species(Figure3A)ClusteringallDichaeteKDamreplicatesbybindingaffinityscoresrevealsthatthe
threebiologicalreplicatesfromeachspeciesclustertightly(Pearsonrsquoscorrelationcoefficientsfrom
092K099Figure3B)butreplicatesfromdifferentspeciesalsoshowverygoodcorrelations(PCC=
064K074)ThepatternsofsimilaritiesanddifferencesbetweenDichaetebindingpatternsineach
speciesasvisualizedbyprincipalcomponentanalysis(PCA)onthenormalisedbindingaffinityscores
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
10
inallboundintervalsareconsistentwiththeexpectationofneutralevolutionalongtheDrosophila
phylogeny
WeemployedDiffBind[86]toperformadifferentialenrichmentanalysiscomparingDichaeteKDam
bindingprofilesbetweentwospeciesforeachbindingintervalidentifyingbindingintervalsthat
showedsignificantquantitativedifferencesbetweeneachpairofspecies[86]Forcomparisons
betweenDmelanogasterandDsimulansorDyakubawefoundapproximatelyequalnumbersof
preferentiallyboundintervalsineachspecies(Figure4AKB)InagreementwiththePCAplot8880
differentiallyboundintervalswereidentifiedbetweenDmelanogasterandDyakubaatFDR1while
only5044wereidentifiedbetweenDmelanogasterandDsimulansindicatingagreateramountof
quantitativebindingdivergencebetweenDmelanogasterandDyakubaAgainthisfindingshows
thatdivergenceinbindingfollowstheknownDrosophilaphylogenyandsuggestsamolecularclock
mechanismforDichaetebindingsiteturnoverashasbeenproposedforotherTFsinDrosophila
[77]Interestinglytheproportionofdifferentiallyboundintervalsthatarepresentinbothspecies
butshowquantitativechangesinbindingstrengthasopposedtothosethatarequalitativelyabsent
inonespeciesalsoincreaseswithphylogeneticdistancefrom580betweenDmelanogasterand
Dsimulansto637betweenDmelanogasterandDyakubaThisalsorepresentsanincreasein
thepercentageofthetotalDmelanogasterDichaetebindingintervalsthatquantitativelychangein
anotherspeciesfrom96inDsimulansto204inDyakuba
Underbalancingselectionitisproposedthatasthefrequencyofconservedbindingeventsat
orthologouspositionsdecreasesbetweenmoredistantlyrelatedspeciesnewbindingeventsatthe
samegenelocishouldevolvetomaintainthesamelevelofgeneexpressionThisisoftenreferredto
asbindingsiteturnoverorcompensatoryevolution[19767783]Inordertodetectsuchevents
weconsideredthesetofDichaetebindingintervalsinonespeciesthatdonotoverlapwithany
intervalinapairwisecomparisonwitheachotherspeciesaswellasthegenestowhichtheyare
annotatedBindingintervalsannotatedtothesamegenebutwhichdonotoverlapinorthologous
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
11
positionwereconsideredinstancesofcompensatoryevolutionWefoundinstancesofbindingsite
turnoverbetweenDmelanogasterandDsimulansat2457genesandbetweenDmelanogaster
andDyakubaat2806genesanexampleofturnoverbetweenDmelanogasterandDyakubaat
therdolocusisshowninFigure4CWewerecuriousastowhetherthesecompensatorybinding
sitesarelocatedwithinCRMsthatareactiveinbothspeciesorwhethertheyariseinconjunction
withtheappearanceofnewfunctionalCRMsToaddressthisweutiliseddatafromSTARRKseq
assaysperformedinDyakubaandDmelanogasterwhereitisestimatedthat19ofactiveD
melanogasterCRMsshowcompensatoryconservationrelativetoDyakubaCRMs[83]Inorderto
takeabroadviewofSoxbindingatCRMswereKanalysedthelowKstringencySTARRKseqdataand
identified21105DmelanogasterCRMsactiveinS2cellsthatshowcompensatoryconservation
relativetoDyakubaand22444thatshowcompensatoryconservationrelativetoDmelanogaster
Inovariansomaticcells(OSCs)wefound12843DmelanogasterCRMsand20207DyakubaCRMs
thatshowcompensatoryconservationInDmelanogasteronly53S2celland105OSC
compensatoryCRMscontainDichaetebindingintervalsthatarealsocompensatoryrelativetoD
yakubawhileinDyakubaonly90S2and157OSCcompensatoryCRMscontainDichaetebinding
intervalsthatarealsocompensatoryrelativetoDmelanogasterThisobservationstronglysuggests
thatthemajorityofDichaetebindingsiteturnovereventshappeninactiveCRMsthatare
functionallyandpositionallyconservedinbothspecies
Binding+conservation+and+regulatory+function
FocusingontheDichaetebindingintervalsthatarepositionallyconservedbetweenspecieswe
assessedwhetherthistypeofconservationisenrichedatcertainclassesoffunctionalsitessuchas
knownCRMsorDichaetetargetgenesSuchanenrichmenthaspreviouslybeenfoundforotherTFs
inDrosophilaincludingtheAPfactorsBcdHbKrGtKniandCad[76]andthemesodermregulator
Twi[77]WefirstexaminedDichaetebindingintervalsassociatedwithREDFlyandFlyLightCRMs[84
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
12
85]andfoundthat644ofDichaetebindingintervalsassociatedwithaREDFlyCRMareconserved
inallthreespeciescomparedtoonly448ofbindingintervalsthatarenotatREDFlyCRMs(Table
3)Converselyonly96ofDichaeteintervalsatREDFlyCRMsareuniquetoDmelanogasterwhile
276ofthosethatareoutsideREDFlyCRMsareunique(SupplementaryFigure3)Similarresults
werefoundforbindingintervalswithinFlyLightenhancersOverallbeinglocatedataknownCRM
hasahighlysignificanteffectonDichaetebindingintervalconservation(REDFlyχ2=1619dfndash3p
=706eK35FlyLightχ2=1773df=3pKvalue=338eK38)WealsoanalysedthiseffectontwoKway
conservationofSoxNbindingintervalsbetweenDmelanogasterandDsimulansagainfindingthat
CRMKassociatedbindingissignificantlymorelikelytobeconservedbetweenspecies(REDFlyχ2=
3236df=1p=240eK72FlyLightχ2=4695df=1p=727eK12)Thestrongincreaseinrates
ofconservationobservedforgroupBSoxbindingintervalswithinCRMssuggeststhatthesesitesare
likelytobeunderbalancingselectiontomaintaintheireffectongeneregulationandisconsistent
withtheimportantregulatoryfunctionsofDichaeteandSoxN[5167]
OurpreviousexperimentsidentifiedDmelanogasterDichaeteandSoxNdirecttargetgenesand
highKconfidencecorebindingintervalssupportedbybothChIPandDamIDdata[5167]Giventhat
DichaeteandSoxNbindingintervalsinknownCRMsarepreferentiallyconservedindicatingalink
betweenfunctionalbindingandconservationweaskedwhethertheyarealsohighlyconservedat
coreintervalsanddirecttargetgenesTheDmelanogasterFDR1DichaeteKDamintervalsidentified
inthisstudyshowagreateroverlapwithcoreDichaeteintervalsthantheFDR1SoxNKDamintervals
dowithSoxNcoreintervalsHoweverforbothTFsDamIDbindingintervalsthatoverlapacore
intervalaresignificantlymorelikelytobeconservedthanthosethatdonot(Dichaeteχ2=14086
df=3p=410eK305SoxNχ2=7331df=1pKvalue=190eK161)(SupplementaryFigure4)For
bothTFsthiseffectisevenstrongerthanthatofbeinglocatedwithinaknownCRMAlthoughcore
bindingintervalsdonotnecessarilyrepresentdirecttargetsasmeasuredbygeneexpressionassays
thesedatashowthattheyarenonethelesssubjecttoevolutionaryconstraintandarelikelytobe
functionallyimportantInterestinglywefoundthatforbothDichaeteandSoxNassociationwitha
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
13
directtargetgenehaslessofaneffectonbindingintervalconservationthanbeinglocatedatacore
interval(Dichaeteχ2=573df=3p=23eK12SoxNχ2=172df=1p=34eK5)
(SupplementaryFigure4)Thisisnotassurprisingasitmayfirstseemsincedirecttargetsare
identifiedbyexpressionchangesinmutantembryosanditisclearthatDichaeteandSoxNcan
functionallycompensateforeachotherrsquoslossmaskingsignificantexpressionchangesatsome
targetsInadditioninmanycasesmultiplebindingintervalsareannotatedtothesametargetgene
andtheseareunlikelytobeequallyfunctionalForexamplesomemayrepresentshadowenhancers
andmaythereforebelessconstrainedbynaturalselection[8788]Thepresenceofthese
functionallylessconstrainedbindingintervalsinourdatasetsislikelytoreducetheoverallrateof
conservationobservedforintervalsannotatedtodirecttargetgenescomparedtothoseatcore
intervalswhicharerobustlydetectedbyindependentbindingassays
Sequence+conservation+of+Sox+motifs+and+binding+intervals+
Weusedthereferencegenomesequenceforeachspeciestoassessthecontributiontobinding
conservationofsequenceconservationwithinbindingintervalsandatTFKspecificbindingmotifsIn
ordertoexaminethepatternsofmotifconservationinDichaetebindingintervalswefirstidentified
allmatchestothebestdenovoSoxmotifdiscoveredineachsetofintervals[89]Wedidthesame
withcontrolbindingintervalsthathadbeenrandomlyshuffledtodifferentlocationsineachgenome
[90]InallcasessignificantlymoreSoxmotifswerefoundinDichaetebindingintervalsthanin
controlintervals(plt403eK15Wilcoxonranksumtestwithcontinuitycorrection)Focusingon
DichaetebindingintervalswecomparedintervalsthatareboundinallfourspeciesincludingD
pseudoobscuratothosethatareuniquetoDmelanogasterThehighlyconservedintervalscontain
significantlymoreSoxmotifsonaverage(mean=453)thanthebindingintervalsuniquetoD
melanogaster(mean=129p=303eK193Wilcoxonranksumtestwithcontinuitycorrection)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
14
showingthatthepresenceandnumberofSoxmotifsispositivelycorrelatedwithDichaetebinding
conservation(Figure5A)
PreviouscomparativeChIPKseqstudiesofTFbindinginDrosophilahavefoundthatoverallnucleotide
conservationisnotsignificantlyelevatedinbindingintervalsthatareconservedbetweenspecies
[7677]TodeterminewhetherthisisalsothecaseforourDamIDdataweusedPRANKa
phylogenyKawarealigner[9192]tocreatemultiplealignmentsofhighKconfidenceorthologous
sequencesfromeachspecieswithDichaetebindingintervalsshowingfourKwaybindingconservation
andthoseshowinguniqueDmelanogasterbindingThesesequencesshouldcontaintheregionsof
regulatoryDNAtowhichDichaetebindshowevertheyarealsolikelytocontainflankingregions
thatarenotfunctionallyrelevantNotsurprisinglywedidnotdetectahigherrateofnucleotide
conservationinintervalsshowingfourKwaybindingconservationcomparedtotheuniqueD
melanogasterintervalsinfacttheuniquelyKboundintervalsshowslightlybutsignificantlygreater
sequenceconservationacrosstheirentirelengths(Wilcoxonranksumtestp=234eK20)(Figure5B)
WescannedthemultiplealignmentsformatchestothedenovoSoxmotifsweidentifiedand
calculatedtheratesofnucleotideconservationspecificallywithinthesemotifs[9394]Incontrast
tothefullbindingintervalsequencesSoxmotifswithinintervalsshowingfourKwayconservation
showsignificantlyhigherratesofnucleotideconservationthanthoseinintervalsthatareonlybound
inDmelanogaster(Wilcoxonranksumtestp=956eK36)Asafurthercontrolwerandomly
shuffledtheSoxmotifstoproduceasetofmotifswiththesameGCcontentandlengthand
searchedformatchestoeachoftheminthemultiplyalignedbindingintervalsWhiletheaverage
ratesofnucleotideconservationinmatchestoshuffledcontrolmotifsareslightlyhigherinintervals
thatdisplayfourKwaybindingconservationthaninuniqueDmelanogasterintervalsSoxmotifsin
intervalswithfourKwaybindingconservationaresignificantlymorehighlyconservedthancontrol
motifsineithersetofintervals(Wilcoxonranksumtestp=163eK42andp=167eK9)(Figure5C)
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
15
WealsosearchedforSoxmotifsthatwereindependentlyidentifiedattheexactorthologous
locationinallfourspecies(positionallyconserved)Wefoundthatapproximately20ofSoxmotifs
inbindingintervalsshowingfourKwaybindingconservationarepositionallyconservedasopposedto
16ofSoxmotifsinintervalsthatareonlyboundinDmelanogasterasignificantdifference
(Wilcoxonranksumtestp=255eK24)Asimilarpatternholdsforthesubsetofpositionally
conservedmotifsthatshowcompletesequenceconservationbetweenallfourspecies(Figure5D)
Theseperfectlyconservedmotifsmakeupapproximately15ofSoxmotifsinintervalsthatshow
fourKwaybindingconservationbutonly12ofSoxmotifsinintervalsthatareuniquelyboundinD
melanogasterAgainthisdifferenceinmotifconservationisstatisticallysignificant(Wilcoxonrank
sumtestp=604eK28)TakentogethertheseresultsshowthatwhileDamIDintervalsthatare
boundbyDichaeteinvivoinmultiplespeciesarenotmorehighlyconservedonaveragethanthose
thatareonlyboundinonespeciesindividualSoxmotifswithinthoseintervalsarebothmorelikely
toretainorthologouspositionsandtoaccumulatefewermutationsduringevolutionSinceDamID
bindingintervalsaregenerallywiderthanChIPKseqintervalsandarenotnecessarilycentredaround
thetruebindingsiteitcanbemoredifficulttoidentifyboundmotifsinDamIDintervalsthaninChIP
intervals[95]HoweverourfindingshighlightalinkbetweeninvivoDichaetebindingconservation
andmotifconservationsuggestingbothfunctionalimportanceofmotifdensityandqualityaswell
asamechanismbywhichbalancingselectionatthesequencelevelcouldfeedbacktomaintain
functionalbindingevents
High+conservation+of+common+binding+by+Dichaete+and+SoxN+
HavingobservedthatDichaetebindingshowsahigherrateofconservationatfunctionalsitessuch
asknownenhancersandcorebindingintervalsweaskedwhetheracomparativeapproachcould
yieldinsightsintotheimportanceofcommonbindingbyDichaeteandSoxNPreviousstudieshave
shownthatDichaeteandSoxNshowextensivebindingsimilarityacrosstheDmelanogastergenome
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
16
andthatinsomecasestheycancompensateforeachotherrsquoslossatthelevelofDNAbinding[51]
HoweveritisnotknownifwidespreadcommonbindinghasbeenretainedthroughoutDrosophila
evolutionoritsfunctionalimportancerelativetouniquebindingbyeachTFToaddressthese
questionsweturnedtotheDichaeteKDamandSoxNKDamdatageneratedinDmelanogasterandD
simulansfirsttakingaqualitativeapproachtoassesscommonanduniquebindingExtensive
commonbindingbyDichaeteandSoxNwasobservedinbothspeciesalthoughsomewhat
surprisinglyitrepresentedagreaterproportionofallbindingeventsinDmelanogasterthaninD
simulans(Figure6A)Nonethelessthesinglelargestcategoryofbindingintervalsweidentifiedwere
thosecommonlyboundbyDichaeteandSoxNinbothspecies(7415intervals)
Notonlyweretherealargenumberofbindingintervalscommonlyboundinbothspeciesintervals
thatarecommonlyboundineitherspeciesaresignificantlymorelikelytobeconservedthan
intervalsbounduniquelybyoneSoxprotein(Figure6B)OutofalltheintervalsidentifiedinD
melanogasterthatarecommonlyboundbyDichaeteandSoxN765showbindingconservationin
DsimulansIncontrastonly401ofuniquelyboundDichaeteintervalsareconservedinD
simulansand337ofuniquelyboundSoxNintervalsareconserved(Table4)ahighlysignificant
differencebetweencommonanduniquebinding(χ2=33983df=2plt22eK16[approaches0])
PerformingthesameanalysisonthebindingintervalsidentifiedinDsimulansrevealsanevenmore
strikingeffectwith941ofcommonlyboundintervalsshowingbindingconservationinD
melanogasterAgainthedifferenceinconservationratesbetweencommonlyanduniquelybound
intervalsishighlysignificant(χ2=24889df=2plt22eK16[approaches0])Thiseffectisstronger
thananyofthepreviouslyexaminedfunctionalcategoriesincludingknownenhancersandcore
intervals
Assigningtheseconservedcommonlyboundintervalstoputativetargetgenesresultedinthe
identificationof5966conservedcoregroupBSoxtargetsinDmelanogasterandDsimulans
(AdditionalFile4)ThesegeneshaveaprofilethatisconsistentwiththeknownpictureofgroupB
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
17
SoxfunctionTheyareprimarilyupregulatedinthedevelopingCNSandareenrichedforGOBP
termsrelatedtobiologicalregulation(p=148eK49)systemdevelopment(p=286eK37)generation
ofneurons(p=455eK31)andneurondifferentiation(p=492eK28)(AdditionalFile5)Thissuggests
thatmanyofthecorefunctionsofDichaeteandSoxNintheflyareachievedthroughregulationof
genesatwhichbothproteinscananddobindandthatthiscommonpotentiallyredundantbinding
isevolutionarilyconservedToexaminetheconservationofthesecoregroupBtargetsonan
expandedevolutionaryscalewecomparedthemwiththetargetsofthemousegroupBSoxproteins
Sox2andSox3andthegroupCSoxproteinSox11(Table5)Wemappedthetargetsofeachofthese
proteinsinneuralprecursorcells(NPCs[29]totheirDmelanogasterorthologuesresultinginalist
of1301orthologuesofSox2targets4213orthologuesofSox3targetsand1485orthologuesof
Sox11targetsBetween40K45ofthetargetsofeachmouseSoxproteinhaveconserved
orthologuesthatarecommonlyboundbyDichaeteandSoxNintheDrosophilagenomewhichis
consistentwithsimilarcomparisonspreviouslymadebetweenmouseSox2targetsandtargetsof
DichaeteandSoxNseparately[5167]Interestinglyroughlytwiceasmanyorthologueswerefound
tobesharedbetweenSox11andSoxNaloneasbetweenSox11andthecommontargetsofDichaete
andSoxNThissupportsourprevioussuggestionthatwhileDichaeteandSoxNmaycontribute
equallytohomologousmammaliangroupB1SoxfunctionsSoxNhasevolvedtotakeonalargepart
oftheneuraldifferentiationfunctionsattributedtoSox11independentlyofDichaeteOverallthere
isahighoverlapbetweentargetsofmousegroupBandCSoxproteinsinthemammalianCNSand
commonconservedtargetsofDichaeteandSoxNintheflysupportingthedeepevolutionary
conservationofSoxfunctionsintheCNS
WhileDichaeteandSoxNcommonlybindthemajorityofconservedbindingintervalswealso
identifiedintervalsthatareuniquelyboundbyeachTFinbothspeciesWeusedDiffBind[86]to
analysequantitativedifferencesinDichaeteandSoxNbindinginbothDmelanogasterandD
simulansWedetected1257bindingintervalsthataredifferentiallyboundbetweenDichaeteand
SoxNinbothspecies(FDR1)withagreaternumberofintervalsbeingpreferentiallyboundby
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
18
Dichaete(778)thanSoxN(479)(Figure6C)Thisisaconsiderablysmallerdifferencethanwasseen
whencomparingDichaeteorSoxNbindingbetweenspeciesmeaningthatthebindingprofilesof
thesetwoparalogousTFsarelessdifferentiatedthanthebindingprofilesofeitherDichaeteorSoxN
inthegenomesoftwodifferentspeciesWeassignedboththepreferentiallyboundintervalsand
thequalitativelyuniquelyboundintervalstotargetgenesThisresultedinasetof925preferential
Dichaetetargetsand526preferentialSoxNtargetswith54overlappinggenesandasetof381
uniqueDichaetetargetsand361uniqueSoxNtargetswithonly14overlappinggenes(Additional
Files6and7)
AlthoughthepreferentialanduniquetargetsofeachTFarenotidenticaltheysharesome
propertiesthatcanshedlightontheuniquefunctionsofeachproteinInbothcaseswhilethetwo
setsofgeneshavesimilarenrichmentsintermsofGOBPtermsincludingtermsrelatedto
morphogenesisdevelopmentneurondifferentiationandbiologicalregulation(AdditionalFiles8
and9)theirspatialexpressionpatternsaccordingtoFlyAtlasshowdifferentprofilesBothunique
andpreferentialtargetsofSoxNareprimarilyupregulatedinthelarvalCNSTheSoxNpreferential
targetgenesareenrichedintheReactomepathwayRoleofAblinRobo8Slitsignallinganimportant
pathwayinaxonguidanceAdditionallyadenovomotifsearchintheuniquelyboundSoxNintervals
uncoveredamotifcorrespondingtoUltraspiracle(Usp)aTFinvolvedinseveralaspectsofneuron
morphogenesis[9697]SoxNhasbeenshowntobeinvolvedinthelateraspectsofCNS
developmentincludingaxonpathfinding[5162]anditsdirecttargetsshowhighoverlapwith
orthologoustargetsofmouseSox11whichisactiveinpostKmitoticdifferentiatingneurons[29]Our
resultssuggestthatthesefunctionsmaydefinetheprimaryuniqueroleofSoxNOntheotherhand
Dichaetepreferentialanduniquetargetsareupregulatedinawiderrangeoftissuesincludingthe
brainheadlarvalCNScropeyehindgutandthoracicoabdominalganglionDichaetehaspreviously
beenshowntobeexpressedandplayaregulatoryroleinboththeembryonichindgutandbrain[63
67]OneofthetopdenovomotifsidentifiedintheuniquelyboundDichaeteintervalscorresponds
toBrachyenteron(Byn)aTFthatiscriticalforthedevelopmentofthehindgut[9899]These
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
19
resultssuggestthatwhileDichaeteandSoxNhavemanysimilarfunctionsduringdevelopmentkey
differencesintheirrolesandtargetgenesmaybedeterminedbydifferencesintheirownexpression
patternsasignatureofneofunctionalisation
OuranalysisofthegenomeKwideinvivobindingpatternsoftwogroupBSoxproteinsina
comparativeevolutionarycontextdemonstratesahighdegreeofconservationingroupBSox
functionintheDrosophilaspeciesstudiedwhileidentifyingnumerousinstancesofbindingsite
turnoverInagreementwithstudiesofotherTFsinDrosophilatheamountofDichaetebindingsite
turnoverbetweenspeciesandquantitativedivergenceinbindingstrengthiscorrelatedwith
phylogeneticdistance[767779]Howeverwehavealsoidentifiedfunctionalcategoriesofsites
whichdisplaysignificantlyhigherconservationthanthebackgroundlevelincludingbindingintervals
inknownCRMsbindingintervalsthathavepreviouslybeenidentifiedascoreintervalsandintervals
wherebothDichaeteandSoxNbindtogetherAtwoKwayanalysisofDichaeteandSoxNbindingin
twospeciesofDrosophilarevealsthatthemajorityofconservedintervalsarecommonlyboundby
bothTFsleadingustoidentifyalistofcoregroupBSoxtargetswhileananalysisofuniquelybound
conservedintervalsprovidesindicationsastothefeaturesthatdifferentiateDichaeteandSoxN
functionsduringdevelopmentOurresultsemphasizethefactthatcommonpotentiallyredundant
bindingbyDichaeteandSoxNhasbeenspecificallyconservedduringDrosophilaevolution
Discussion
InthisstudywehaveexpandeduponourpreviousworkidentifyingbindingtargetsofDichaeteand
SoxNinDmelanogastertoexaminetheevolutionarydynamicsofgroupBSoxbindingpatternsin
multiplespeciesofDrosophilaOuranalysisofDichaetebindinginfourspeciesshowedhighoverall
conservationintermsoftargetgenesandgenomicfeaturesassociatedwithbindingbutalso
uncoveredextensiveturnoverinindividualbindingeventsAtthesequencelevelweidentified
highlyenrichedSoxmotifsinallsetsofbindingintervalsthatshowedbothdensityandnucleotide
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
20
conservationratescorrelatedwithbindingconservationToaddressthewidespreadcommon
bindingobservedbetweenDichaeteandSoxNweperformedatwoKwaycomparisonofthebinding
patternsofbothTFsintwospeciesDmelanogasterandDsimulansWeobservedthatcommon
bindingispreferentiallyconservedbetweenspeciesincomparisontobindingbyasinglefactor
alonesuggestingthatDichaeteandSoxNrsquosabilitytobindtothesamelociisanimportantfeatureof
theirdevelopmentalfunctionsInadditionalargenumberofcommonlyboundtargetsare
conservedorthologoustargetsofgroupBSoxproteinsinthemouseparticularlySox2andSox3
whichalsoshowcoKexpressionandfunctionalredundancyinthedevelopingCNSraisingthe
possibilityofadeeplyKconservedroleforcompensationamongSoxgroupcoKmembers
WefoundanumberofpatternsinourthreeKwayevolutionaryanalysisofDichaetebindinginD
melanogasterDsimulansandDyakubaNormalizingthereadcountsfromallsamplestogether
allowedustoreducetheeffectsofcomparingseparatelythresholdeddatasetswhichcanleadtoan
underestimateofsimilarityWhenwecountedthedivergentbindingintervalsbetweenspecieswe
foundthatthenumberofdifferencesincreaseswiththephylogeneticdistanceofthespeciesbeing
comparedAsimilarpatternwasfoundinthecorrelationsbetweenbindingprofilesacrossallbound
regionswithsamplesfrommorecloselyrelatedspecieshavinghighercorrelationsThese
correlationsrangingfrom062ndash072areinlinewiththecorrelationsbetweenAPfactorbinding
profilesinDmelanogasterandDyakubawhichrangefrom057forCadto075forKr[76]Wealso
identifiedinstancesofbindingsiteturnoveratindividualgenelociwhichcouldpotentiallyrepresent
compensatorygainsandlossesmaintainingtargetgeneexpressionlevelsduringevolutionThis
modeofregulatoryevolutionappearstobeimportantforDichaeteaswefoundthatinallpairwise
comparisonsthemajorityofbindingintervalsthatwerenotpositionallyconservedinonespecies
hadatleastonebindingintervalpresentatadifferentpositionwithinthesamelocusintheother
speciesInterestinglyveryfewoftheseturnovereventshappenedinCRMsthatalsodisplayed
turnoverbetweenspecies[83]suggestingfluidityinindividualbindingsitelocationswithinrelatively
stableblocksofregulatoryDNA[100]Nonethelessbindingintervalsassociatedwithannotated
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
21
CRMsfromFlyLightorREDFlydisplayahigherrateofconservationthanthoselocatedoutsideof
CRMsaltogetherwhichhighlightstheirfunctionalrole
IfbalancingselectionhasworkedtomaintainfunctionalbindingeventsduringDrosophilaevolution
thenonewouldexpecttofindselectiveconstraintonthenucleotidesequencewithinbinding
intervalsIndeedgiventhedesignofourDamIDexperimentsinwhichweexpressedfusionproteins
containingtheDmelanogastergroupBSoxsequencesineachspeciesanydifferencesinbinding
shouldbeattributabletodifferencesinthegenomesequenceornuclearenvironmentofeach
speciesratherthantotheSoxproteinsthemselvesPreviouscomparativestudiesusingChIPKseq
havefoundthatoverallnucleotideconservationisnotsignificantlyelevatedinconservedbinding
intervals[7677]andourfindingsagreewiththisobservationHoweverwedidfindsignificant
correlationsbetweenbindingconservationandSoxmotifcontentinintervalsConservedbinding
intervalshavemorematchestoSoxmotifsonaveragethannonKconservedintervalsorrandomly
chosencontrolintervalsindicatingthatanincreasedmotifdensitymaycontributetofunctional
bindingbygroupBSoxproteinsandbeanimportantfacetinbindingconservationConserved
bindingintervalsalsocontainmoreSoxmotifsthatarepositionallyconservedacrossspeciesand
thatshowcompletenucleotideconservationthannonKconservedintervalsAdditionallySoxmotifs
withinconservedintervalshaveahigherpercentageofconservednucleotidesacrossallfourspecies
thanthoseinnonKconservedintervalsorrandomlyshuffledcontrolmotifsWhilewecannot
determinefromthesedatawhetherthepresenceofhighKqualitySoxmotifscausesfunctional
bindingorwhetherfunctionalbindingleadstoselectivepressuretomaintainSoxmotifsourresults
suggestthatafeedbackloopmightoperatebetweenthesetwoconditionsleadingtotheobserved
correlationbetweenhighlyconservedmotifmatchesandinvivobindingconservation
OneofthemostperplexingaspectsofgroupBSoxbiologyinbothinsectsandvertebratesistheir
extensivecoKexpressionapparentfunctionalredundancyandwidespreadcommonbindingpatterns
Whilewehavepreviouslydescribedbroadcommonbindingaswellasspecificexamplesofbinding
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
22
compensationbetweenDichaeteandSoxNinDmelanogasterthefunctionalandevolutionary
significanceofthesepatternshaveremainedunclearHerewehavefoundaclearpreferential
conservationofcommonlyboundintervalsbetweenDmelanogasterandDsimulansThesetwo
speciesofDrosophilaarecloselyrelatedshowingahighamountofsyntenyandalevelof
divergenceequivalenttothatbetweenhumanandtherhesusmacaqueasmeasuredby
substitutionsperneutralsite[101102]Inagreementwithpreviousstudiestheratesofbinding
divergenceforbothDichaeteandSoxNbetweenDrosophilaspeciesarelowerthanthoseforTFsin
vertebratespeciesatcomparableevolutionarydistances[2081]Nonethelessdespitetheoverall
highlevelofbindingconservationtheincreaseinconservationatcommonlyboundsitescompared
touniquelyboundsiteswith94ofcommonlyKboundDsimulanssitesbeingconservedinD
melanogasterisstrikingandhighlysignificantWeexpectthatacomparisonofgroupBSoxbinding
patternsinmoredistantlyrelatedDrosophilaspecieswillrevealevengreaterdifferences
Integratinginvivobindingdatafromtwospeciesallowedustoidentifyasetoforthologousbinding
intervalsthatareconsistentlyboundbybothDichaeteandSoxNacrossgenomesandnuclear
environmentsManyofthetargetgenesassociatedwiththeseintervalsaredeeplyconservedgroup
BSoxtargetsComparingthesecoretargetgeneswiththetargetsofmouseSox2Sox3andSox11in
theCNSrevealedthatthecommonconservedtargetsofDichaeteandSoxNshowgreateroverlap
withbothSox2andSox3targetsthaneithercoreSoxNorDichaetetargetsaloneOntheotherhand
thetargetsofSox11agroupCSoxproteinshowgreateroverlapwithcoreSoxNtargetsthanwith
eitherDichaeteorcommontargetsintheflyTakentogetherthesedatasuggestthatcommon
bindingisanimportantfeatureofgroupBSoxfunctionandthattherolesplayedbyDichaeteand
SoxNthroughcommonbindingarerepresentativeofancientrolesforgroupBSoxproteinsthatare
conservedfrominsectstovertebrates
ThereasonsformaintainingtwoparalogousTFswithsuchhighlyoverlappingbindingpatternsand
manycompensatoryfunctionsduringevolutionremainunclearOnehypothesisisthatconserved
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
23
commonbindingisnecessarytoproviderobustnesstoregulatorynetworksduringcriticalstagesof
earlyneuraldevelopment[5173103104]HowevercommonconservedtargetsofDichaeteand
SoxNincludebothgeneswherethetwoTFshavebeenshowntodemonstratefunctional
compensationsuchasthehomeodomainDVKpatterninggenesintermediateneuroblastsdefective
(ind)andventralnervoussystemdefective(vnd)aswellasgeneswheretheyshowatleastsome
oppositeregulatoryeffectssuchastheproneuralgenesachaete(ac)andlethalofscute(lrsquosc)or
prospero(pros)aTFinvolvedinneuroblastdifferentiation[516667]Inadditiontodirectly
regulatingtheexpressionoftargetgenesSoxproteinscanalsoinduceDNAbendingpotentially
alteringthelocalchromatinenvironmentandcreatingindirecteffectsongeneexpressionby
bringingotherregulatoryfactorstogether[105ndash107]suchanarchitecturalrolemightbemoreeasily
performedbymultiplefamilymembersthantargetKspecificfunctionsTheseobservationssuggesta
complexfunctionalrelationshipbetweengroupBSoxfamilymembersinfliesincludingboth
functionalcompensationandbalancedopposingeffectsatcertainlocibothofwhichare
evolutionarilyconservedThehighoverlapintheregulatorynetworksinfluencedbyDichaeteand
SoxN[51]mightitselfleadtoselectiveconstraintonbindingsitesasmutationsaffectingsitesthat
canbefunctionallyboundbybothTFswouldhaveagreaterdisruptiveeffectThisisconsistentwith
thehighlyconservedDNAKbindingdomainsfoundinparalogousgroupBSoxproteins
AmodelofstrongselectiontomaintaincommonbindingbetweenDichaeteandSoxNcanalso
explainthetypesofneofunctionalisationsthattheyhaveacquiredAtthesequencelevelthe
majorityofthediversificationbetweenDichaeteandSoxNcanbefoundinregionsoutsideofthe
HMGdomainwhichcoulddriveinteractionswithspecificbindingpartnersandineachgenersquos
regulatoryregionswhichdeterminewheretheyareuniquelyorcoKexpressed[45]Althoughthey
showextensiveofoverlappingexpressioninthedevelopingCNSDichaeteandSoxNarealso
expressedinspecificdomainsintheembryoDichaeteshowsuniqueexpressioninthemidlinebrain
andhindgutwhileSoxNisexpresseduniquelyinthelateralcolumnoftheneuroectodermandina
specificpatternintheepidermisatlaterstages[505563108]Thefunctionalsignaturesoftargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
24
genesthatwefoundtobeuniquelyboundandevolutionarilyconservedincludingbothspatial
expressionpatternsandenrichedmotifscorrespondingtopotentialcofactorspointtothese
domainKspecificrolesAlthoughchangesintargetspecificityduetobindingwithcofactorshasnot
beendemonstratedforSoxproteinsinDrosophilaitisawellKcharacterizedphenomenonforthe
HoxfamilyofTFs[11109110]DichaetehaspreviouslybeenshowntobindtogetherwithVentral
veinslacking(Vvl)inthemidline[5056]wehaveidentifiedanenrichedmotifforthehindgutK
specificfactorByninuniquelyboundconservedDichaeteintervalswhichmayrepresentasimilar
interaction
UnlikevertebrategroupBSoxproteinswhichcanbesplitintotwosubgroupswithlargely
specialisedregulatoryfunctionsDichaeteandSoxNappeartoactaspartiallyredundantactivators
atalargesetofcoretargetgenesandtoopposeeachotherrsquosactivityatasmallersetoftargetsIn
additioneachproteinuniquelyregulatessmallersubsetsofgenesbothpositivelyandnegativelyin
specifictissues[51]ThesedifferencesreflecttheindependentevolutionarytrajectoriesofgroupB
Soxgenesinvertebratesandinsectswhicharestillnotfullyresolved[4560]Howevergiventhe
broadpatternsofcoKexpressionandfunctionalredundancyobservedbetweencoKmembersof
variousSoxfamiliesacrosstheSoxphylogeny[27313237ndash4349]itislogicaltoaskwhethersuch
partialredundancyisasharedancestralfeatureortheresultofconvergentevolutionTwo
competingmodelsfortheevolutionofgroupBSoxgenesbothproposeatleastoneduplication
eventattheancestralSoxBlocusbeforetheprotostomedeuterostomespliteitherintandemoras
theresultofawholegenomeduplication[60]WhilethedetailsofthegroupBSoxexpansionare
debateditispossiblethatredundancybetweenthefirstgroupBSoxparaloguesmayhavebeen
retainedthroughoutevolutionwhilelaterparaloguessplitintogroupB1andB2invertebratesor
acquiredpartialneofunctionalisationsininsectsThismodelcombinedwithourdatasuggeststhat
theintegratedactionofmultipleSoxproteinsisanancestralfeatureofCNSdevelopmentthathas
beenconsistentlyselectedforandthatcontinuestobeelaboratedonandrefinedindifferent
lineages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
25
Conclusions
ThestudiespresentedherehaveexaminedtheinvivobindingoftheDrosophilagroupBSox
proteinsDichaeteandSoxNinanevolutionarycontextfocusingonadetailedanalysisofthe
patternsofbindingsiteturnoverforDichaeteinfourspeciesandamultiKfactoranalysisofDichaete
andSoxNbindingintwospeciesWeshowbroadconservationofDichaeteandSoxNregulatory
networkscoupledwithwidespreadbindingdivergenceataratethatisconsistentwithstudiesof
otherTFsinDrosophilaWedemonstratepreferentialbindingconservationatfunctionalsites
includingknownCRMsaswellasevenstrongerbindingconservationatsitesthatarecommonly
boundbyDichaeteandSoxNThecommonconservedbindingintervalsdefineasetoftargetsthat
alsosharedeeporthologywithtargetsofmousegroupBSoxproteinssuggestingthatthey
representancestralfunctionsintheCNSFinallyweproposeamodeofgroupBSoxevolution
wherebycommonbindingandpartialredundancyisspecificallymaintainedwhileindividual
paraloguesacquirenovelfunctionslargelythroughchangestotheirownexpressionpatternsandor
bindingpartners
Methods
Flyhusbandryandembryocollection
ThewildKtypestrainsofthefollowingDrosophilaspecieswereusedinallexperimentsD
melanogasterw1118Dsimulansw[501](referencestrainK
httpwwwncbinlmnihgovgenome200genome_assembly_id=28534)DyakubaCamK115
[111]Dpseudoobscurapseudoobscura(referencestrainK
httpwwwncbinlmnihgovgenome219genome_assembly_id=28567)DmelanogasterD
simulansandDyakubastockswerekeptat25degConstandardcornmealmediumDpseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
26
stockswerekeptat225degCatlowhumidityonbananaKopuntiaKmaltmedium(1000mlwater30g
yeast10gagar20mlNipagin150gmashedbananas50gmolasses30gmalt25gopuntia
powder)Allembryocollectionswereperformedat25degCongrapejuiceagarplatessupplemented
withfreshyeastpasteEmbryosweredechorionatedfor3minutesin50bleachandwashedwith
waterbeforesnapKfreezinginliquidnitrogenandstorageatK80degC
Generationoftransgeniclines
ThreepiggyBacvectorswerecreatedforDamIDcontainingaDichaeteKDamconstructaSoxNKDam
constructoraDamKonlyconstructTheSoxNKDamfusionproteincodingsequencealongwith
upstreamUASsitesanHsp70promoterandtheSV405rsquoUTRwasclonedfromanexistingpUAST
vector[51]TheDichaeteKDamfusionproteincodingsequencealongwithupstreamUASsitesan
Hsp70promoterandthekayak5rsquoUTRwasclonedfromgenomicDNAextractedfromaD
melanogasterlinecarryingthisconstruct[112]TheDamcodingregionaswellasupstreamUAS
sitesanHsp70promoterandtheSV405rsquoUTRwasdirectlyexcisedfromanexistingpUASTvector
(giftfromTSouthall)usingSphIandStuIAllinsertswerefirstclonedintothepSLfa1180fashuttle
vector(giftfromEWimmer)[113](SoxNKDamSpeIStuIDichaeteKDamSpeIAvrIIDamSphIStuI)
TheinsertswerethenexcisedfromtheshuttlevectorsusingtheoctoKcuttersFseIandAscIand
clonedintothefinalpBac3xP3KEGFPvector(alsofromErnstWimmer)PlasmidDNAwas
microinjectedintoembryosfromeachspeciesataconcentrationof06microgmicroltogetherwitha
piggyBachelperplasmidat04microgmicrolAllmicroinjectionswereperformedbySangChaninthe
DepartmentofGeneticsinjectionfacilityUniversityofCambridgeSurvivingadultswere
backcrossedtowScoSM6amalesorvirginfemalesforDmelanogasterortomalesorvirgin
femalesfromtheparentallinefortheotherspeciesF1progenywerescoredforeyeKspecificGFP
expressionandDmelanogasterinsertionswerebalancedovereitherSM6aorTM6c
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
27
IsolationofDamIDDNAandsequencing
EmbryosfromtheDichaete8DamSoxN8DamandDam8onlytransgeniclinesineachspecieswere
collectedafterovernightlaysandDNAwasisolatedusingminormodificationstotheprotocolof
Vogelandcolleagues[95]Threebiologicalreplicateswerecollectedfromeachlineconsistingof
approximately50K150microlsettledvolumeofembryosperreplicateToextracthighKmolecularweight
genomicDNAeachaliquotofembryoswashomogenizedinaDounce15Kmlhomogenizerin10ml
ofhomogenizationbuffer10strokeswereappliedwithpestleBfollowedby10strokeswithpestle
AThelysatewasthenspunfor10minutesat6000gThesupernatantwasdiscardedandthepellet
wasresuspendedin10mlhomogenizationbufferthenspunagainfor10minutesat6000gThe
supernatantwasagaindiscardedandthepelletwasresuspendedin3mlhomogenizationbuffer
300microlof20nKlauroylsarcosinewereaddedandthesampleswereinvertedseveraltimestolyse
thenucleiThesamplesweretreatedwithRNaseAfollowedbyproteinaseKat37degCTheywerethen
purifiedbytwophenolKchloroformextractionsandonechloroformextractionGenomicDNAwas
precipitatedbyadding2volumesofEtOHand01volumeof3MNaOAcdriedandresuspendedin
50K150microlTEbufferdependingonthestartingamountofembryos
30microlofeachDNAsamplewasdigestedforatleast2hourswithDpnIat37degCToeliminateobserved
contaminationfromnonKdigestedgenomicDNA07volumesofAgencourtAMPureXPbeads
(BeckmanCoulter)wereaddedtoeachsampletoremovehighKmolecularweightDNA11volumes
ofAgencourtAMPureXPbeadswerethenaddedtorecoverallremainingDNAfragmentswhich
wereelutedin30microlTEbufferandthenligatedtoadoubleKstrandedoligonucleotideadapterIf
bandswereobservedinthePCRproductstheligationwasrepeatedwithadapterstitratedto12or
14LigationproductsweredigestedwithDpnIItoremovefragmentswithunmethylatedGATC
sequencesandamplifiedbyPCRfollowedbypurificationviaaphenolKchloroformextractionThe
DNAwasprecipitatedandresuspendedin50microlTEbufferthensonicatedinordertoreducethe
averagefragmentsizeusingaCovarisS2sonicatorwiththefollowingsettingsintensity5dutycycle
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
28
10200cyclesburst300secondsThesampleswerepurifiedusingaQIAquickPCRPurificationkit
toremovesmallfragmentsSampleconcentrationsweremeasuredusingaQubitwiththeDNAHigh
SensitivityAssay(LifeTechnologies)andthesizedistributionsofDNAfragmentsweremeasured
usinga2100BioanalyzerwiththeHighSensitivityDNAkitandchips(Agilent)Samplesweresentto
BGITechSolutions(HongKong)CoLtdforlibraryconstructionusingthestandardChIPKseqlibrary
protocolandweresequencedonanIlluminaMiSeqorHiSeq2000Librariesweremultiplexedwith2
samplesperrunfortheMiSeqand9K12samplesperlanefortheHiSeqMiSeqlibrarieswererunas
150KbpsingleKendreadswhileHiSeqlibrarieswererunas50KbpsingleKendreads
Sequencingdataanalysis
Cutadapt[114]wasusedtotrimDamIDadaptersequencesfrombothendsofreadsTrimmedreads
weremappedagainstthefollowingreferencegenomesusingbowtie2[115]withthedefault
settingsDmelanogasterApril2006(UCSCdm3BDGP)[116117]DsimulansApril2005(UCSC
droSim1TheGenomeInstituteatWashingtonUniversity(WUSTL))[101]DyakubaNovember2005
(UCSCdroYak2TheGenomeInstituteatWUSTL)[101]andDpseudoobscuraNovember2004(UCSC
dp3BaylorCollegeofMedicineHumanGenomeSequencingCenter(BCMKHGSC))[101118]All
referencegenomesweredownloadedfromtheUCSCGenomeBrowser
(httphgdownloadsoeucscedudownloadshtml)Mappedreadsweresortedandindexedusing
SAMtools[119]
ForDamIDdataanalysisthepositionofeveryGATCsiteineachgenomewasdeterminedusingthe
HOMERutilityscanMotifGenomeWidepl(httphomersalkeduhomerindexhtml)[120]Foreach
samplereadswereextendedtotheaveragefragmentlength(200bp)usingtheBEDToolsslop
utilityThenumberofextendedreadsoverlappingeachGATCfragmentwasthencalculatedusing
theBEDToolscoverageutility[90]Theresultingcountsforeachsamplewerecollatedtoforma
counttableconsistingofonecolumnforeachfusionproteinorDamKonlysampleandonerowfor
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
29
eachGATCfragmentThecounttablesservedasinputstorunDESeq2(runinRversion310using
RStudioversion098)whichwasusedtotestfordifferentialenrichmentinthefusionprotein
samplesversustheDamKonlysamplesateachGATCfragment[121]Fragmentsflaggedas
differentiallyenriched(log2foldchangegt0andadjustedpKvaluelt005orlt001)wereextracted
andneighbouringenrichedfragmentswithin100bpweremergedtoformedbindingintervals
BindingintervalswerescannedfordenovoandknownmotifsusingHOMERfindMotifsGenomepl
[120]AnoverviewofthispipelinecanbefoundathttpsgithubcomsarahhcarlflychipwikiBasicK
DamIDKanalysisKpipeline
BothbindingintervalsandreadsfromnonKDmelanogasterspeciesweretranslatedtotheD
melanogasterUCSCdm3referencegenomeusingtheLiftOverutilityfromtheUCSCGenome
Browser(httpgenomeucsceducgiKbinhgLiftOver)[122]ForDsimulansandDyakubathe
minMatchparameterwassetto07whileforDpseudoobscuraitwassetto05multipleoutputs
werenotpermittedDiffBindwasusedtoenableaquantitativecomparisonbetweenDamID
datasetsfromallspeciesusingtranslateddata[86]TranslatedreadsinBEDformatwerebackK
convertedtoSAMformatusingacustomperlscript(bed2sampl
httpsgithubcomsarahhcarlFlychiptreemasterDamID_analysis)andthenintoBAMformat
usingSAMtools[119]Foreachanalysisthetranslatedreadsfromeachsamplewerenormalized
togetherinDiffBindusingtheDESeq2normalizationmethod
BindingintervalswereassignedtotheclosestgeneintheDmelanogastergenomeusingaperl
scriptwrittenbyBettinaFischer(UniversityofCambridge)Genomicfeatureannotationswere
performedusingChIPSeeker[123]ThedistancesfrombindingintervalstoTSSswerecalculatedand
plottedusingChIPpeakAnno[124]Allcalculationsofoverlapsbetweenintervaldatasetswere
performedusingtheBEDToolsintersectutility[90]Allvisualizationofsequencedatawasdone
usingtheIntegratedGenomeBrowser(IGB)[125]Geneontologyanalysiswasperformedusing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
30
FlyMine[126]withaBenjaminiandHochbergcorrectionformultipletestingandacorrectionfor
genelengtheffect
Evolutionarysequenceanalysisofbindingmotifs
DiffBindwasusedtoidentifythesetofDichaeteKDambindingintervalsuniquetoDmelanogaster
andthesetconservedinallfourspecies[86]ThegenomiccoordinatesoftheseintervalsinD
melanogasterwereextractedandtranslatedtoeachotherspeciesrsquoreferencegenomeassembly
usingLiftOverwiththeminMatchparametersetto07Thesequencesofeachorthologousbinding
intervalineachspecieswereobtainedusingthefetchKUCSCsequencestoolfromRSATpreserving
strandinformation[93]Foreachintervalforwhichoneunambiguousorthologoussequencecould
beidentifiedinallfourspeciesthesequencesweremultiplyalignedusingPRANKestimatingthe
guidetreedirectlyfromthedata[9192]TwostrategieswereusedtopredictSoxbindingsites
withinbindingintervalsFirstthepositionalweightmatrices(PWMs)representingthetopKscoring
denovoSoxmotifidentifiedviaHOMERineachbindingintervaldatasetweredownloaded[120]
FIMOwasusedtosearchformatchestoeachPWMinallbindingintervalsandtheresultinghits
wereusedtocalculatetheaveragenumberofmotifsperbindinginterval[89]ThePWMswerealso
usedtoscanmultiplealignmentsofbothfourKwayconservedanduniqueDmelanogasterDichaeteK
DambindingintervalsusingmatrixKscanfromRSATwiththepreKcompiledDrosophilabackground
fileandwiththecutoffforreportingmatchessettoaPWMweightKscoreofgt=4andapKvalueoflt
00001Ifabindingintervalwasidentifiedatthesamealignedpositioninanorthologousbinding
intervalinmorethanonespeciesitwasconsideredtobepositionallyconservedbetweenthose
speciesRandomlyshuffledcontrolmotifsweregeneratedusingtheRSATtoolpermuteKmatrix[93]
andusedinthesamewayAllstatisticaltestswereperformedinRv310usingRStudiov098
Dataaccess
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
31
AllsequencingandbindingintervaldatadescribedinthispaperhavebeendepositedinNCBIrsquosGene
ExpressionOmnibus(GEO)[127]andareaccessiblethroughGEOSeriesaccessionnumberGSE63333
(httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE63333)
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests
Authorscontributions
SCandSRconceivedanddesignedtheexperimentsSCperformedtheexperimentsandanalysedthe
dataSCandSRdraftedthemanuscriptAllauthorsreadandapprovedthefinalmanuscript
Descriptionofadditionalfiles
SupplementaryFigure1MultiplealignmentofDichaeteandSoxNeuroaminoacidsequences(A)
MultiplealignmentoftheentireDichaetesequencefromDmelanogasterDsimulansDyakuba
andDpseudoobscura(B)MultiplealignmentoftheentireSoxNsequencefromDmelanogasterD
simulansDyakubaandDpseudoobscuraTheHMGdomainsofeachorthologousproteinare
highlightedinred
SupplementaryFigure2GenomicfeaturesannotatedtogroupBSoxDamIDbindingintervals
Featureclassesincludeexonsintrons5rsquoUTRs3rsquoUTRspromotersimmediatedownstreamand
intergenicEachintervalmaybeannotatedwithmorethanoneclassifitoverlapsmultiplefeatures
(A)PercentagesofDmelanogasterDichaeteKDamintervalsannotatedtoeachfeatureclass(B)
PercentagesofDmelanogasterSoxNKDamintervalsannotatedtoeachfeatureclass(C)
PercentagesofDsimulansDichaeteKDamintervalsannotatedtoeachfeatureclass(D)Percentages
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
32
ofDsimulansSoxNKDamintervalsannotatedtoeachfeatureclass(E)PercentagesofDyakuba
DichaeteKDamintervalsannotatedtoeachfeatureclass(F)PercentagesofDpseudoobscura
DichaeteKDamintervalsannotatedtoeachfeatureclass
SupplementaryFigure3DamIDintervalsoverlappingaknownCRMarepreferentiallyconserved
(A)DichaeteKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowthreeKway
bindingconservationbetweenDmelanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andareless
likelytobeuniquetoDmelanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=706eK35)(B)
SoxNKDambindingintervalsthatoverlapaREDFlyCRMaremorelikelytoshowtwoKwaybinding
conservationbetweenDmelanogasterandDsimulans(ldquoDmelKDsimrdquo)andarelesslikelytobe
uniquetoDmelanogasterthanthosethatdonot(p=240eK72)(C)DichaeteKDambindingintervals
thatoverlapaFlyLightCRMaremorelikelytoshowthreeKwaybindingconservationandareless
likelytobeuniquetoDmelanogasterthanthosethatdonot(p=338eK38)(D)SoxNKDambinding
intervalsthatoverlapaFlyLightCRMaremorelikelytoshowtwoKwaybindingconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatdonot(p=727eK12)
SupplementaryFigure4DamIDintervalsoverlappingacorebindingintervalorannotatedtoa
directtargetgenearepreferentiallyconserved(A)DichateKDambindingintervalsthatoverlapa
coreDichaetebindingsitearemorelikelytoshowthreeKwayconservationbetweenD
melanogasterDsimulansandDyakuba(ldquo3Kwayrdquo)andarelesslikelytobeuniquetoD
melanogaster(ldquoDmeluniquerdquo)thanthosethatdonot(p=410eK305)(B)SoxNKDambinding
intervalsthatoverlapacoreSoxNbindingsitearemorelikelytoshowtwoKwayconservation
betweenDmelanogasterandDsimulans(ldquo2Kwayrdquo)andarelesslikelytobeuniquetoD
melanogasterthanthosethatdonot(p=190eK161)(C)DichaeteKDambindingintervalsthatare
annotatedtoaDichaetedirecttargetgenearemorelikelytoshowthreeKwayconservationandare
lesslikelytobeuniquetoDmelanogasterthanthosethatarenot(p=23eK12)Howeverthiseffect
isweakerthanforcoreintervals(D)SoxNKDambindingintervalsthatareannotatedtoaSoxNdirect
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
33
targetgenearemorelikelytoshowtwoKwayconservationandarelesslikelytobeuniquetoD
melanogasterthanthosethatarenot(p=34eK5)Againhoweverthiseffectisweakerthanfor
coreintervals
Additionalfile1
Fileformatxls
TitleGenesannotatedtoSoxDamIDbindingintervals
DescriptionListoftargetgenesannotatedtoDichaeteandSoxNDamIDbindingintervalsinall
speciesFornonKmelanogasterspeciestheannotationwasperformedonbindingintervalscalled
fromsequencedatathathadbeentranslatedtotheDmelanogastergenomeTheCGnumbersfrom
NCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile2
Fileformatxls
TitleGeneontologytermsenrichedinSoxtargetgenelists
DescriptionListofallsignificantlyenrichedGOtermsforDichaeteandSoxNannotatedtargetgenes
inallspeciesThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile3
Fileformatxls
TitleEnricheddenovomotifsdiscoveredinSoxbindingintervals
DescriptionForeachDichaeteandSoxNDamIDbindingintervaldatasetthetop20enrichedde
novomotifsdiscoveredbyHOMERarelistedThefirstcolumnliststherankofthemotifthesecond
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
34
columnliststhebestguessTFmatchingthemotifaccordingtoHOMERthethirdcolumnliststhe
consensussequenceofthemotifandthefourthcolumnlistsitspKvalue
Additionalfile4
Fileformatxls
TitleTargetgenesannotatedtoconservedcommonlyboundintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatarecommonlyboundby
DichaeteandSoxNaswellasconservedbetweenDmelanogasterandDsimulansTheCGnumbers
fromNCBIgenesymbolsandFlybaseidentifiersforeachgenearelisted
Additionalfile5
Fileformatxls
TitleGeneontologytermsenrichedincommonlyboundconservedtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
arecommonlyboundbyDichaeteandSoxNandareconservedbetweenDmelanogasterandD
simulansThefirstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhe
correctedpKvalue
Additionalfile6
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundDichaeteintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbyDichaete
andconservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbols
andFlybaseidentifiersforeachgenearelisted
Additionalfile7
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
35
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundDichaetetargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbyDichaeteandareconservedbetweenDmelanogasterandDsimulansThe
firstcolumnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Additionalfile8
Fileformatxls
TitleTargetgenesannotatedtoconserveduniquelyboundSoxNintervals
DescriptionListoftargetgenesannotatedtobindingintervalsthatareuniquelyboundbySoxNand
conservedbetweenDmelanogasterandDsimulansTheCGnumbersfromNCBIgenesymbolsand
Flybaseidentifiersforeachgenearelisted
Additionalfile9
Fileformatxls
TitleGeneontologytermsenrichedinconserveduniquelyboundSoxNtargetgenes
DescriptionListofallsignificantlyenrichedGOtermsfortargetgenesannotatedtointervalsthat
areuniquelyboundbySoxNandareconservedbetweenDmelanogasterandDsimulansThefirst
columnliststheGObiologicalprocesstermandthesecondcolumnliststhepKvalue
Acknowledgements
ThisworkwassupportedbyaWellcomeTrust4KyearPhDstudentshipandaCambridgeOverseas
TruststudentshiptoSCThefundershadnoroleinstudydesigndatacollectionandanalysis
decisiontopublishorpreparationofthemanuscriptWethankTonySouthallforprovidingthe
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
36
pUASTKDamvectorandEnricoFerreroforprovidingthepUASTKSoxNDamvectorWealsothank
ErnstWimmerforprovidingthepSLfa1180favectorthepBac3xP3KEGFPvectorandthepiggyBac
transposasehelperplasmidWeareindebtedtoSangChanforperformingmicroinjectionsandto
BettinaFischerforsupportinthelabandinsightfuldiscussionsaboutdataanalysis
References
1DunhamIKundajeAAldredSFCollinsPJDavisCADoyleFEpsteinCBFrietzeSHarrowJKaulRKhatunJLajoieBRLandtSGLeeBKKPauliFRosenbloomKRSaboPSafiASanyalAShoreshNSimonJMSongLTrinkleinNDAltshulerRCBirneyEBrownJBChengCDjebaliSDongXDunhamIetalAnintegratedencyclopediaofDNAelementsinthehumangenomeNature201248957ndash74
2GordonKLRuvinskyITempoandmodeinevolutionoftranscriptionalregulationPLoSGenet20128e1002432
3NephSVierstraJStergachisABReynoldsAPHaugenEVernotBThurmanREJohnSSandstromRJohnsonAKMauranoMTHumbertRRynesEWangHVongSLeeKBatesDDiegelMRoachVDunnDNeriJSchaferAHansenRSKutyavinTGisteEWeaverMCanfieldTSaboPZhangMBalasundaramGetalAnexpansivehumanregulatorylexiconencodedintranscriptionfactorfootprintsNature201248983ndash90
4ThemodENCODEConsortiumRoySErnstJKharchenkoPVKheradpourPNegreNEatonMLLandolinJMBristowCAMaLLinMFWashietlSArshinoffBIAyFMeyerPERobineNWashingtonNLDiStefanoLBerezikovEBrownCDCandeiasRCarlsonJWCarrAJungreisIMarbachDSealfonRTolstorukovMYWillSAlekseyenkoAAArtieriCetalIdentificationofFunctionalElementsandRegulatoryCircuitsbyDrosophilamodENCODEScience20103301787ndash1797
5WrayGATheevolutionarysignificanceofcisVregulatorymutationsNatRevGenet20078206ndash216
6BigginMDAnimalTranscriptionNetworksasHighlyConnectedQuantitativeContinuaDevCell201121611ndash626
7HueberSDLohmannIShapingsegmentsHoxgenefunctioninthegenomicageBioEssays200830965ndash979
8HueberSDBezdanDHenzSRBlankMWuHLohmannIComparativeanalysisofHoxdownstreamgenesinDrosophilaDevelopment2007134381ndash392
9KaplanTLiXKYSaboPJThomasSStamatoyannopoulosJABigginMDEisenMBQuantitativeModelsoftheMechanismsThatControlGenomeVWidePatternsofTranscriptionFactorBindingduringEarlyDrosophilaDevelopmentPLoSGenet20117e1001290
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
37
10LiXKYThomasSSaboPJEisenMBStamatoyannopoulosJABigginMDTheroleofchromatinaccessibilityindirectingthewidespreadoverlappingpatternsofDrosophilatranscriptionfactorbindingGenomeBiol201112R34
11MannRSLelliKMJoshiRChapter3HoxSpecificityInCurrTopDevBiolVolume88Elsevier200963ndash101
12ZinzenRPGirardotCGagneurJBraunMFurlongEEMCombinatorialbindingpredictsspatioVtemporalcisVregulatoryactivityNature200946265ndash70
13AleksicJRussellSChIPingawayatthegenomethenewfrontiertravelguideMolBiosyst200951421
14ChenYNegreNLiQMieczkowskaJOSlatteryMLiuTZhangYKimTKKHeHHZiebaJRuanYBickelPJMyersRMWoldBJWhiteKPLiebJDLiuXSSystematicevaluationoffactorsinfluencingChIPVseqfidelityNatMethods20129609ndash614
15FisherWWLiJJHammondsASBrownJBPfeifferBDWeiszmannRMacArthurSThomasSStamatoyannopoulosJAEisenMBBickelPJBigginMDCelnikerSEDNAregionsboundatlowoccupancybytranscriptionfactorsdonotdrivepatternedreportergeneexpressioninDrosophilaProcNatlAcadSci201210921330ndash21335
16MarinovGKKundajeAParkPJWoldBJLargeVScaleQualityAnalysisofPublishedChIPVseqDataG358GenesGenomesGenetics20134209ndash223
17ParkDLeeYBhupindersinghGIyerVRWidespreadMisinterpretableChIPVseqBiasinYeastPloSOne20138e83506
18KimJHeXSinhaSEvolutionofRegulatorySequencesin12DrosophilaSpeciesPLoSGenet20095e1000330
19LudwigMZBergmanCPatelNHKreitmanMEvidenceforstabilizingselectioninaeukaryoticenhancerelementNature2000403564ndash567
20VillarDFlicekPOdomDTEvolutionoftranscriptionfactorbindinginmetazoansmdashmechanismsandfunctionalimplicationsNatRevGenet2014
21JagerMQueacuteinnecEHoulistonEManuelMExpansionoftheSOXgenefamilypredatedtheemergenceoftheBilateriaMolPhylogenetEvol200639468ndash477
22JagerMQueacuteinnecEChioriRLeGuyaderHManuelMInsightsintotheearlyevolutionofSOXgenesfromexpressionanalysesinactenophoreJExpZoologBMolDevEvol2008310B650ndash667
23LarrouxCFaheyBLiubicichDHinmanVFGauthierMGongoraMGreenKWoumlrheideGLeysSPDegnanBMDevelopmentalexpressionoftranscriptionfactorgenesinademospongeinsightsintotheoriginofmetazoanmulticellularityEvolDev20068150ndash173
24PhochanukulNRussellSNobackbonebutlotsofSoxInvertebrateSoxgenesIntJBiochemCellBiol201042453ndash464
25SrivastavaMBegovicEChapmanJPutnamNHHellstenUKawashimaTKuoAMitrosTSalamovACarpenterMLSignorovitchAYMorenoMAKammKGrimwoodJSchmutzJShapiroH
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
38
GrigorievIVBussLWSchierwaterBDellaportaSLRokhsarDSTheTrichoplaxgenomeandthenatureofplacozoansNature2008454955ndash960
26BowlesJSchepersGKoopmanPPhylogenyoftheSOXFamilyofDevelopmentalTranscriptionFactorsBasedonSequenceandStructuralIndicatorsDevBiol2000227239ndash255
27GuthSIEWegnerMHavingitbothwaysSoxproteinfunctionbetweenconservationandinnovationCellMolLifeSci2008653000ndash3018
28SchepersGETeasdaleRDKoopmanPTwentypairsofsoxextenthomologyandnomenclatureofthemouseandhumansoxtranscriptionfactorgenefamiliesDevCell20023167ndash170
29BergslandMRamskoldDZaouterCKlumSSandbergRMuhrJSequentiallyactingSoxtranscriptionfactorsinneurallineagedevelopmentGenesDev2011252453ndash2464
30KamachiYUchikawaMCollignonJLovellKBadgeRKondohHInvolvementofSox12and3intheearlyandsubsequentmoleculareventsoflensinductionDevelopment19981252521ndash2532
31UwanoghoDRexMCartwrightEJPearlGHealyCScottingPJSharpePTEmbryonicexpressionofthechickenSox2Sox3andSox11genessuggestsaninteractiveroleinneuronaldevelopmentMechDev19954923ndash36
32WoodHBEpiskopouVComparativeexpressionofthemouseSox1Sox2andSox3genesfrompreVgastrulationtoearlysomitestagesMechDev199986197ndash201
33HuangJArsenaultMKannMLopezKMendezCSalehMWadowskaDTaglientiMHoJMiaoYSimsDSpearsJLopezAWrightGHartwigSThetranscriptionfactorsryVrelatedHMGboxV4(SOX4)isrequiredfornormalrenaldevelopmentin)vivoSOXCGenesDuringRenalDevelopmentDevDyn2013242790ndash799
34SockERettigSDEnderichJBoslMRTammERWegnerMGeneTargetingRevealsaWidespreadRolefortheHighVMobilityVGroupTranscriptionFactorSox11inTissueRemodelingMolCellBiol2004246635ndash6644
35WilsonMEYangKYKalousovaALauJKosakaYLynnFCWangJMrejenCEpiskopouVCleversHCGermanMSTheHMGBoxTranscriptionFactorSox4ContributestotheDevelopmentoftheEndocrinePancreasDiabetes2005543402ndash3409
36DownesMKoopmanPSOX18andtheTranscriptionalRegulationofBloodVesselDevelopmentTrendsCardiovascMed11318ndash324
37MatsuiTRedundantrolesofSox17andSox18inpostnatalangiogenesisinmiceJCellSci20061193513ndash3526
38BhattaramPPenzoKMeacutendezASockEColmenaresCKanekoKJVassilevADePamphilisMLWegnerMLefebvreVOrganogenesisreliesonSoxCtranscriptionfactorsforthesurvivalofneuralandmesenchymalprogenitorsNatCommun201011ndash12
39FerriALMSox2deficiencycausesneurodegenerationandimpairedneurogenesisintheadultmousebrainDevelopment20041313805ndash3819
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
39
40NishiguchiSWoodHKondohHLovellKBadgeREpiskopouVSox1directlyregulatestheγVcrystallingenesandisessentialforlensdevelopmentinmiceGenesDev199812776ndash781
41OkudaYOguraEKondohHKamachiYB1SOXCoordinateCellSpecificationwithPatterningandMorphogenesisintheEarlyZebrafishEmbryoPLoSGenet20106e1000936
42RizzotiKBrunelliSCarmignacDThomasPQRobinsonICLovellKBadgeRSOX3isrequiredduringtheformationofthehypothalamoVpituitaryaxisNatGenet200436247ndash255
43WegnerMStoltCCFromstemcellstoneuronsandgliaaSoxistrsquosviewofneuraldevelopmentTrendsNeurosci200528583ndash588
44CollignonJSockanathanSHackerACohenKTannoudjiMNorrisDRastanSStevanovicMGoodfellowPNLovellKBadgeRAcomparisonofthepropertiesofSoxV3withSryandtworelatedgenesSoxV1andSoxV2Development1996122509ndash520
45McKimmieCWoerfelGRussellSConservedgenomicorganisationofGroupBSoxgenesininsectsBMCGenet2005626
46MalasSDuthieSDeloukasPEpiskopouVTheisolationandhighVresolutionchromosomalmappingofhumanSOX14andSOX21twomembersoftheSOXgenefamilyrelatedtoSOX1SOX2andSOX3MammGenomeOffJIntMammGenomeSoc199910934ndash937
47WegnerMSOXafterSOXSOXessionregulatesneurogenesisGenesDev2011252423ndash2428
48UchikawaMKamachiYKondohHTwodistinctsubgroupsofGroupBSoxgenesfortranscriptionalactivatorsandrepressorstheirexpressionduringembryonicorganogenesisofthechickenMechDev199984103ndash120
49UchikawaMYoshidaMIwafuchiKDoiMMatsudaKIshidaYTakemotoTKondohHB1andB2SoxgeneexpressionduringneuralplatedevelopmentinchickenandmouseembryosUniversalversusspeciesVdependentfeaturesDevGrowthDiffer201153761ndash771
50SaacutenchezKSorianoNRussellSTheDrosophilaSOXVdomainproteinDichaeteisrequiredforthedevelopmentofthecentralnervoussystemmidlineDevelopment19981253989ndash3996
51FerreroEFischerBRussellSSoxNeuroorchestratescentralnervoussystemspecificationanddifferentiationinDrosophilaandisonlypartiallyredundantwithDichaeteGenomeBiol201415R74
52AmbrosettiDKCBasilicoCDaileyLSynergisticactivationofthefibroblastgrowthfactor4enhancerbySox2andOctV3dependsonproteinVproteininteractionsfacilitatedbyaspecificspatialarrangementoffactorbindingsitesMolCellBiol1997176321ndash6329
53ArcherTCJinJCaseyESInteractionofSox1Sox2Sox3andOct4duringprimaryneurogenesisDevBiol2011350429ndash440
54BeryAMartynogaBGuillemotFJolyJKSRetauxSCharacterizationofEnhancersActiveintheMouseEmbryonicCerebralCortexSuggestsSoxPoucisVRegulatoryLogicsandHeterogeneityofCorticalProgenitorsCerebCortex2013
55CreacutemazyFBertaPGirardFSoxNeuroanewDrosophilaSoxgeneexpressedinthedevelopingcentralnervoussystemMechDev200093215ndash219
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
40
56MaYCertelKGaoYNiemitzEMosherJMukherjeeAMutsuddiMHuseinovicNCrewsSTJohnsonWAothersFunctionalInteractionsbetweenDrosophilabHLHPASSoxandPOUTranscriptionFactorsRegulateCNSMidlineExpressionoftheslitGeneJNeurosci2000204596ndash4605
57MasuiSNakatakeYToyookaYShimosatoDYagiRTakahashiKOkochiHOkudaAMatobaRSharovAAKoMSHNiwaHPluripotencygovernedbySox2viaregulationofOct34expressioninmouseembryonicstemcellsNatCellBiol20079625ndash635
58TanakaSKamachiYTanouchiAHamadaHJingNKondohHInterplayofSOXandPOUFactorsinRegulationoftheNestinGeneinNeuralPrimordialCellsMolCellBiol2004248834ndash8846
59WilsonMJDeardenPKEvolutionoftheinsectSoxgenesBMCEvolBiol20088120
60ZhongLWangDGanXYangTHeSParallelExpansionsofSoxTranscriptionFactorGroupBPredatingtheDiversificationsoftheArthropodsandJawedVertebratesPLoSONE20116e16570
61BuescherMHingFSChiaWFormationofneuroblastsintheembryoniccentralnervoussystemofDrosophilamelanogasteriscontrolledbySoxNeuroDevelopment20021294193ndash4203
62GirardFJolyWSavareJBonneaudNFerrazCMaschatFChromatinimmunoprecipitationrevealsanovelrolefortheDrosophilaSoxNeurotranscriptionfactorinaxonalpatterningDevBiol2006299530ndash542
63SaacutenchezKSorianoNRussellSRegulatoryMutationsoftheDrosophilaSoxGeneDichaeteRevealNewFunctionsinEmbryonicBrainandHindgutDevelopmentDevBiol2000220307ndash321
64ShenSPAleksicJRussellSIdentifyingtargetsoftheSoxdomainproteinDichaeteintheDrosophilaCNSviatargetedexpressionofdominantnegativeproteinsBMCDevBiol2013131
65OvertonPTheroleofSoxgenesinthedevelopmentofDrosophilamelanogasterUniversityofCambridge2003
66OvertonPMMeadowsLAUrbanJRussellSEvidencefordifferentialandredundantfunctionoftheSoxgenesDichaeteandSoxNduringCNSdevelopmentinDrosophilaDevelopment20021294219ndash4228
67AleksicJFerreroEFischerBShenSPRussellSTheroleofDichaeteintranscriptionalregulationduringDrosophilaembryonicdevelopmentBMCGenomics201314861
68ZhaoGWheelerSRSkeathJBGeneticcontrolofdorsoventralpatterningandneuroblastspecificationintheDrosophilaCentralNervousSystemIntJDevBiol200751107ndash115
69IsshikiTPearsonBHolbrookSDoeCQDrosophilaneuroblastssequentiallyexpresstranscriptionfactorswhichspecifythetemporalidentityoftheirneuronalprogenyCell2001106511
70MaurangeCGouldAPBrainybutnottoobrainystartingandstoppingneuroblastdivisionsinDrosophilaTrendsNeurosci20052830ndash36
71NowakMABoerlijstMCCookeJSmithJMEvolutionofgeneticredundancyNature1997388167ndash171
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
41
72TautzDRedundanciesdevelopmentandtheflowofinformationBioEssaysNewsRevMolCellDevBiol199214263ndash266
73WagnerAGeneduplicationsrobustnessandevolutionaryinnovationsBioEssays200830367ndash373
74GreerJMPuetzJThomasKRCapecchiMRMaintenanceoffunctionalequivalenceduringparalogousHoxgeneevolutionNature2000403661ndash665
75MaconochieMNonchevSMorrisonAKrumlaufRParalogousHoxgenesfunctionandregulationAnnuRevGenet199630529ndash556
76BradleyRKLiXKYTrapnellCDavidsonSPachterLChuHCTonkinLABigginMDEisenMBBindingSiteTurnoverProducesPervasiveQuantitativeChangesinTranscriptionFactorBindingbetweenCloselyRelatedDrosophilaSpeciesPLoSBiol20108e1000343
77HeQBardetAFPattonBPurvisJJohnstonJPaulsonAGogolMStarkAZeitlingerJHighconservationoftranscriptionfactorbindingandevidenceforcombinatorialregulationacrosssixDrosophilaspeciesNatGenet201143414ndash420
78MacArthurSLiXKYLiJBrownJBChuHCZengLGrondonaBPHechmerASimirenkoLKeranenSVDevelopmentalrolesof21DrosophilatranscriptionfactorsaredeterminedbyquantitativedifferencesinbindingtoanoverlappingsetofthousandsofgenomicregionsGenomeBiol200910R80
79ParisMKaplanTLiXYVillaltaJELottSEEisenMBExtensiveDivergenceofTranscriptionFactorBindinginDrosophilaEmbryoswithHighlyConservedGeneExpressionPLoSGenet20139e1003748
80OdomDTDowellRDJacobsenESGordonWDanfordTWMacIsaacKDRolfePAConboyCMGiffordDKFraenkelETissueVspecifictranscriptionalregulationhasdivergedsignificantlybetweenhumanandmouseNatGenet200739730ndash732
81SchmidtDWilsonMDBallesterBSchwaliePCBrownGDMarshallAKutterCWattSMartinezKJimenezCPMackaySTalianidisIFlicekPOdomDTFiveVVertebrateChIPVseqRevealstheEvolutionaryDynamicsofTranscriptionFactorBindingScience20103281036ndash1040
82StefflovaKThybertDWilsonMDStreeterIAleksicJKaragianniPBrazmaAAdamsDJTalianidisIMarioniJCFlicekPOdomDTCooperativityandRapidEvolutionofCoboundTranscriptionFactorsinCloselyRelatedMammalsCell2013154530ndash540
83ArnoldCDGerlachDSpiesDMattsJASytnikovaYAPaganiMLauNCStarkAQuantitativegenomeVwideenhanceractivitymapsforfiveDrosophilaspeciesshowfunctionalenhancerconservationandturnoverduringcisVregulatoryevolutionNatGenet2014
84GalloSMGerrardDTMinerDSimichMDesSoyeBBergmanCMHalfonMSREDflyv30towardacomprehensivedatabaseoftranscriptionalregulatoryelementsinDrosophilaNucleicAcidsRes201039(Database)D118ndashD123
85ManningLHeckscherESPuriceMDRobertsJBennettALKrollJRPollardJLStraderMELuptonJRDyukarevaAVDoanPNBauerDMWilburANTannerSKellyJJLaiSKLTranKDKohwiMLavertyTRPearsonJCCrewsSTRubinGMDoeCQAResourceforManipulatingGene
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
42
ExpressionandAnalyzingcisVRegulatoryModulesintheDrosophilaCNSCellRep201221002ndash1013
86RossKInnesCSStarkRTeschendorffAEHolmesKAAliHRDunningMJBrownGDGojisOEllisIOGreenARAliSChinSKFPalmieriCCaldasCCarrollJSDifferentialoestrogenreceptorbindingisassociatedwithclinicaloutcomeinbreastcancerNature2012
87LudwigMZManuKittlerRWhiteKPKreitmanMConsequencesofEukaryoticEnhancerArchitectureforGeneExpressionDynamicsDevelopmentandFitnessPLoSGenet20117e1002364
88PerryMWBoettigerANBothmaJPLevineMShadowEnhancersFosterRobustnessofDrosophilaGastrulationCurrBiol2010201562ndash1567
89GrantCEBaileyTLNobleWSFIMOscanningforoccurrencesofagivenmotifBioinformatics2011271017ndash1018
90QuinlanARHallIMBEDToolsaflexiblesuiteofutilitiesforcomparinggenomicfeaturesBioinformatics201026841ndash842
91LoumlytynojaAGoldmanNPhylogenyVAwareGapPlacementPreventsErrorsinSequenceAlignmentandEvolutionaryAnalysisScience20083201632ndash1635
92LoumlytynojaAGoldmanNAnalgorithmforprogressivemultiplealignmentofsequenceswithinsertionsProcNatlAcadSciUSA200510210557ndash10562
93ThomasKChollierMDefranceMMedinaKRiveraASandOHerrmannCThieffryDvanHeldenJRSAT2011regulatorysequenceanalysistoolsNucleicAcidsRes201139(suppl)W86ndashW91
94TuratsinzeJKVThomasKChollierMDefranceMvanHeldenJUsingRSATtoscangenomesequencesfortranscriptionfactorbindingsitesandcisVregulatorymodulesNatProtoc200831578ndash1588
95VogelMJPericKHupkesDvanSteenselBDetectionofinvivoproteinndashDNAinteractionsusingDamIDinmammaliancellsNatProtoc200721467ndash1478
96LeeTMartickeSSungCRobinowSLuoLCellVAutonomousRequirementoftheUSPEcRVBEcdysoneReceptorforMushroomBodyNeuronalRemodelinginDrosophilaNeuron28807ndash818
97ParrishJZGenomeVwideanalysesidentifytranscriptionfactorsrequiredforpropermorphogenesisofDrosophilasensoryneurondendritesGenesDev200620820ndash835
98KispertAHerrmannBGLeptinMReuterRHomologsofthemouseBrachyurygeneareinvolvedinthespecificationofposteriorterminalstructuresinDrosophilaTriboliumandLocustaGenesDev199482137ndash2150
99MurakamiRTakashimaSHamaguchiTDevelopmentalgeneticsoftheDrosophilagutspecificationofprimordiasubdivisionandovertVdifferentiationCellMolBiolNoisy88GdFr199945661ndash676
100HareEEPetersonBKIyerVNMeierREisenMBSepsidevenVskippedEnhancersAreFunctionallyConservedinDrosophilaDespiteLackofSequenceConservationPLoSGenet20084e1000106
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
43
101ClarkAGEisenMBSmithDRBergmanCMOliverBMarkowTAKaufmanTCKellisMGelbartWIyerVNPollardDASacktonTBLarracuenteAMSinghNDAbadJPAbtDNAdryanBAguadeMAkashiHAndersonWWAquadroCFArdellDHArguelloRArtieriCGBarbashDABarkerDBarsantiPBatterhamPBatzoglouSBegunDetalEvolutionofgenesandgenomesontheDrosophilaphylogenyNature2007450203ndash218
102StarkALinMFKheradpourPPedersenJSPartsLCarlsonJWCrosbyMARasmussenMDRoySDeorasANRubyJGBrenneckeJcuratorsHFProjectBDGHodgesEHinrichsASCaspiAPatenBParkSKWHanMVMaederMLPolanskyBJRobsonBEAertsSvanHeldenJHassanBGilbertDGEastmanDARiceMWeirMetalDiscoveryoffunctionalelementsin12DrosophilagenomesusingevolutionarysignaturesNature2007450219ndash232
103VavouriTSempleJILehnerBWidespreadconservationofgeneticredundancyduringabillionyearsofeukaryoticevolutionTrendsGenet24485ndash488
104WagnerADistributedrobustnessversusredundancyascausesofmutationalrobustnessBioEssays200527176ndash188
105FerrariSHarleyVRPontiggiaAGoodfellowPNLovellKBadgeRBianchiMESRYlikeHMG1recognizessharpanglesinDNAEMBOJ1992114497
106GhaviKHelmYKleinFAPakozdiTCiglarLNoordermeerDHuberWFurlongEEMEnhancerloopsappearstableduringdevelopmentandareassociatedwithpausedpolymeraseNature2014
107RussellSRSanchezKSorianoNWrightCRAshburnerMTheDichaetegeneofDrosophilamelanogasterencodesaSOXVdomainproteinrequiredforembryonicsegmentationDevelopment19961223669ndash3676
108OvertonPMChiaWBuescherMTheDrosophilaHMGVdomainproteinsSoxNeuroandDichaetedirecttrichomeformationviatheactivationofshavenbabyandtherestrictionofWinglesspathwayactivityDevelopment20071342807ndash2813
109ChanSKKRyooHKDGouldAKrumlaufRMannRSSwitchingtheinvivospecificityofaminimalHoxVresponsiveelementDevelopment19971242007ndash2014
110SlatteryMMaLNeacutegreNWhiteKPMannRSGenomeVWideTissueVSpecificOccupancyoftheHoxProteinUltrabithoraxandHoxCofactorHomothoraxinDrosophilaPLoSONE20116e14686
111CoyneJAElwynSKimSYLlopartAGeneticstudiesoftwosisterspeciesintheDrosophilamelanogastersubgroupDyakubaandDsantomeaGenetRes20048411ndash26
112RiazFTheapplicationofmutantDNAadeninemethyltransferaseenzymesinDamIdentificationUniversityofCambridge2009
113HornCWimmerEAAversatilevectorsetforanimaltransgenesisDevGenesEvol2000210630ndash637
114MartinMCutadaptremovesadaptersequencesfromhighVthroughputsequencingreadsEMBnetJ201117ppndash10
115LangmeadBSalzbergSLFastgappedVreadalignmentwithBowtie2NatMethods20129357ndash359
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
44
116AdamsMDTheGenomeSequenceofDrosophilamelanogasterScience20002872185ndash2195
117CelnikerSEWheelerDAKronmillerBCarlsonJWHalpernAPatelSAdamsMChampeMDuganSPFriseEFinishingawholeVgenomeshotgunrelease3oftheDrosophilamelanogastereuchromaticgenomesequenceGenomeBiol20023research0079
118RichardsSComparativegenomesequencingofDrosophilapseudoobscuraChromosomalgeneandcisVelementevolutionGenomeRes2005151ndash18
119LiHHandsakerBWysokerAFennellTRuanJHomerNMarthGAbecasisGDurbinR1000GenomeProjectDataProcessingSubgroupTheSequenceAlignmentMapformatandSAMtoolsBioinformatics2009252078ndash2079
120HeinzSBennerCSpannNBertolinoELinYCLasloPChengJXMurreCSinghHGlassCKSimpleCombinationsofLineageVDeterminingTranscriptionFactorsPrimecisVRegulatoryElementsRequiredforMacrophageandBCellIdentitiesMolCell201038576ndash589
121LoveMIHuberWAndersSModeratedestimationoffoldchangeanddispersionforRNAVseqdatawithDESeq2bioRxiv2014
122BardetAFHeQZeitlingerJStarkAAcomputationalpipelineforcomparativeChIPVseqanalysesNatProtoc2011745ndash61
123YuGChIPseekerChIPseekerforChIPPeakAnnotationComparisonandVisualizationR2014
124ZhuLJGazinCLawsonNDPagegravesHLinSMLapointeDSGreenMRChIPpeakAnnoaBioconductorpackagetoannotateChIPVseqandChIPVchipdataBMCBioinformatics201011237
125NicolJWHeltGABlanchardSGRajaALoraineAETheIntegratedGenomeBrowserfreesoftwarefordistributionandexplorationofgenomeVscaledatasetsBioinformatics2009252730ndash2731
126LyneRSmithRRutherfordKWakelingMVarleyAGuillierFJanssensHJiWMclarenPNorthPFlyMineanintegrateddatabaseforDrosophilaandAnophelesgenomicsGenomeBiol20078R129
127EdgarRDomrachevMLashAEGeneExpressionOmnibusNCBIgeneexpressionandhybridizationarraydatarepositoryNucleicAcidsRes200230207ndash210
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
45
Figurelegends
Figure1Overviewofexperimentalstrategies(A)Schematicoftransgeniclinesgeneratedineach
speciesofDrosophilaThreeplasmidscontainingeitheraDichaeteKDamfusionproteinaSoxNKDam
fusionproteinorDamonlywereinjectedDichaeteKDamandDamKonlylineswerecreatedforeach
specieswhileSoxNKDamlineswereonlycreatedinDmelanogasterandDsimulansThe
dendrogramrepresentstheevolutionaryrelationshipsbetweenthespeciesusedPhotographsof
fliesarefromNGompel(httpwwwibdmlunivK
mrsfrequipesBP_NGIllustrationsmelanogaster20subgrouphtml)Abbreviationsmel
DrosophilamelanogastersimDrosophilasimulansyakDrosophilayakubapseDrosophila
pseudoobscura(B)OutlineoftheDamIDKseqexperimentalprotocolATFKDamfusionproteinora
DamKonlycontrolisexpressedinembryosleadingtomethylationofGATCsitesinthevicinityof
bindingeventsGenomicDNAisextractedandmethylatedfragmentsareisolatedviadigestionwith
DpnIwhichrecognisesonlymethylatedGATCThesefragmentsarepurifiedthroughligationof
adaptersfurtherdigestionofnonKmethylatedDNAwithDpnIIandPCRamplification[95]DNAfrom
bothTFKDamfusionsamplesandcontrolDamKonlysamplesissequencedandmappedtothe
genomeandtherelativeenrichmentofreadsiscomparedbetweenconditionstogeneratebinding
profilesOrangelinesrepresentGATCsitesgreyovalsrepresentmethylationbluediamonds
representDamproteinandpinkovalsrepresentTFsFigureadaptedfromCarlandRussell(inpress)
Figure2ResultsofcomparativeDamIDforDichaeteandSoxN(A)DichaeteKDambindingprofiles
plottedontheDmelanogastergenomeshowingtranslatedreadsmappingtoanorthologousregion
infourspeciesThreebiologicalreplicatesfromeachspeciesareshownAllreadlibrarieswere
scaledtoatotalsizeof1millionreadsforvisualizationpurposesTheyKaxesofalltracksrangefrom
0K50reads(B)SoxNKDambindingprofilesplottedontheDmelanogastergenomeshowing
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
46
translatedreadsmappingtoanorthologousregionintwospeciesThreebiologicalreplicatesfrom
eachspeciesareshownAllreadlibrarieswerescaledtoatotalsizeof1millionreadsfor
visualizationpurposesTheyKaxesofalltracksrangefrom0K50readsThesame~120Kkbregionof
chromosome2Lisshownin(A)and(B)(C)OverlapsbetweenDmelanogasterSoxDamIDbinding
intervalsandknownCRMsfromREDFlyandFlyLightatthedecapentaplegic(dpp)locusBinding
profilesrepresentthenormalisedlog2foldchangesbetweenSoxfusionbindingandDamKonly
controlbindingineachGATCfragment(D)DenovoSoxmotifsdiscoveredinDamIDbinding
intervalsMotifsdiscoveredinDichaeteintervalsineachspeciesshowapreferenceforTinthe
fourthpositionofthecoreCAAAGmotifwhilethosediscoveredinSoxNintervalsineachspecies
showapreferenceforAAbbreviationsmelDrosophilamelanogastersimDrosophilasimulans
yakDrosophilayakubapseDrosophilapseudoobscura
Figure3ThreeVwaycomparisonofDichaetebinding(A)Piechartshowingthepercentageofall
Dichaetebindingintervalsidentifiedthatarepresentinonespecies(47)twospecies(23)orall
threespecies(30)(B)HeatmapshowingallbiologicalreplicatesofDichaeteKDamsamplesinD
melanogasterDsimulansandDyakubaclusteredbybindingaffinityscore(logofnormalisedread
counts)withinboundintervalsThecolorkeyandhistogramshowsthedistributionofcorrelation
coefficientsforaffinityscoresbetweeneachpairofsamplesDarkergreencorrespondstoahigher
correlationbetweensampleswhilelightergreencorrespondstoalowercorrelationBiological
replicatesfromeachspeciesclusterstronglytogetherthoughgoodcorrelationsareseeneven
betweenspecies(C)Plotofprincipalcomponentanalysis(PCA)ofDichaeteKDambindingaffinity
scoresinboundintervalsinDmelanogasterDsimulansandDyakubaThefirstprincipal
component(xKaxis)separatesDyakubafromtheothertwospecieswhilethesecondprincipal
component(yKaxis)separatesDmelanogasterfromDsimulansandDyakuba
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
47
Figure4PairwiseanalysisofDichaetebindingsiteturnover(A)Quantitativedifferencesin
DichaetebindingbetweenDmelanogasterandDsimulansMAplotshowsbindingintervalsthat
arepreferentiallyboundinDmelanogaster(pinkfoldchangegt0)orDsimulans(pinkfoldchange
lt0)atFDR1HeatmapshowsdifferentiallyboundDichaeteKDamintervalsbetweenDmelanogaster
andDsimulansclusteredbybindingaffinityscoresThecolorkeyandhistogramshowsthe
distributionofbindingaffinityscores(logofnormalisedreadcounts)inallboundintervalsineach
sampleDarkergreencorrespondstohigheraffinityscoresorstrongerbindingwhilelightergreen
correspondstoloweraffinityscoresorweakerbindingRoughlyequalnumbersofintervalsare
preferentiallyboundineachspecies(B)QuantitativedifferencesinDichaetebindingbetweenD
melanogasterandDyakubaMAplotshowsbindingintervalsthatarepreferentiallyboundinD
melanogaster(pinkfoldchangegt0)orDyakuba(pinkfoldchangelt0)atFDR1Heatmapshows
differentiallyboundDichaeteKDamintervalsbetweenDmelanogasterandDyakubaclusteredby
bindingaffinityscoresThecolorkeyandhistogramareasin(A)Againroughequalnumbersof
intervalsarepreferentiallyboundineachspeciesalthoughmoredifferentiallyboundintervalsare
identifiedoverall(C)ExampleofDichaetebindingsiteturnoverbetweenDmelanogaster(blue)and
Dyakuba(orange)atthereducedocelli(rdo)locusBindingprofilesrepresentthenormalisedlog2
foldchangesbetweenDichaeteKDambindingandDamKonlycontrolbindingineachGATCfragment
BarsrepresentboundintervalsidentifiedatFDR5Boundintervalsthatarepositionallyconserved
arenotshownStrongbindingisobservedinthethirdfourthandeleventhintronsinDyakuba
thesebindingeventsarelostinDmelanogasterbutseveralbindingintervalsaregainedinthefirst
andfourthintrons
Figure5DensityandconservationofSoxmotifsarehigherinDichaeteintervalswithbinding
conservation(A)OnaverageDichaetebindingintervalsthatareconservedbetweenallfourspecies
(ldquo4Kwayconservedrdquo)havemoreSoxmotifsthanintervalsthatareuniquetoDmelanogaster(ldquoD
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
48
meluniquerdquo)(p=303eK193)(B)Dichaetebindingintervalsthatareconservedbetweenallfour
speciesdonotshowanincreasedrateoftotalnucleotideconservationonaveragethanintervals
thatareuniquetoDmelanogaster(C)SoxmotifsinDichaeteintervalsthatareboundinallfour
species(ldquo4KwaySoxrdquo)haveagreaterpercentageofperfectlyconservednucleotidesacrossallspecies
thaneitherSoxmotifsinintervalsthatareuniquetoDmelanogaster(ldquoDmelSoxrdquop=956eK36)
randomlyshuffledcontrolmotifsinintervalsthatareboundinallfourspecies(ldquo4Kwaycontrolrdquop=
162eK43)orrandomlyshuffledcontrolmotifsinintervalsthatareuniquetoDmelanogaster(ldquoD
melcontrolrdquop=167eK9)(D)OnaverageDichaeteintervalsthatareboundinallfourspecieshave
moreSoxmotifsthatarebothpositionallyconservedandshow100nucleotideconservationacross
allspeciesthanintervalsthatareonlyboundinDmelanogaster(p=604eK28)Themultiple
alignmentillustratesapositionallyconservedSoxmotifwith100nucleotideconservation
(highlightedinpurple)
Figure6PreferentialconservationofcommonbindingbyDichaeteandSoxN(A)Venndiagram
showingallDichaeteandSoxNbindingintervalsidentifiedinDmelanogasterandDsimulansThe
singlelargestcategory(7415intervals)consistsofcommonlyboundconservedbindingintervals
(B)BarplotsshowingthatcommonlyboundintervalsaremorelikelytobeconservedbetweenD
melanogasterandDsimulansthanintervalsuniquelyboundbyeitherDichaeteorSoxNThe
differenceinconservationratesishighlysignificantforbothDmelanogasterintervals(topplt22eK
16[approaches0])andDsimulansintervals(bottomplt22eK16[approaches0])(C)MAplot
showingbindingintervalsthataredifferentiallyboundbetweenDichaete(pinkfoldchangegt0)and
SoxN(pinkfoldchangelt0)inbothDmelanogasterandDsimulansFewerquantitativedifferences
inbindingaredetectedbetweenthetwoTFsthanbetweenthebindingprofilesofoneTFintwo
species(D)EnrichedmotifsidentifiedinintervalsthatareuniquelyboundbySoxNorDichaetein
bothDmelanogasterandDsimulansAmotifmatchingUltraspiracle(Usp)aTFinvolvedinaxon
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
49
pathfindingwasfoundinuniqueSoxNintervalswhileamotifmatchingBrachyenteron(Byn)aTF
criticalforhindgutdevelopmentwasfoundinuniqueDichaeteintervals
Tables
Table1BindingintervalsfoundforeachDamIDdatasetandgenesannotatedtointervals
A Bindingintervals(plt001)
Bindingintervals(plt005)
DmelDKDam 17530 20848 DmelSoxNK
Dam 17833 22952 DsimDKDam NA 17833 DsimSoxNK
Dam NA 17209 DyakDKDam 21988 26563 DpseSoxNK
Dam NA 2951
B Translatedbindingintervals
OverlapswithD)melbindingintervals
ofD)melintervalsoverlapping
ofnonVD)melintervalsoverlapping
DsimDKDam 16119 11647 559 723DsimSoxNKDam 15142 11891 518 785DyakDKDam 20964 14573 699 695DpseDKDam 2020 1301 624 644
C Genesannotatedtobindingintervals
GenescommontoDichaeteandSoxN
Directtargetsannotatedtobindingintervals
DmelDKDam 9400 844511731373(854)
DmelSoxNKDam 9528 8445 434544(800)
DsimDKDam 9383 7524
11111373(809)
DsimSoxNKDam 8948 7524 412544(757)
DyakDKDam 12192 NA
12491373(910)
DpseDKDam 1888 NA 4071373(296)Table2OverlapsbetweenSoxbindingintervalsandknownCRMs
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
50
Dataset CRMsindataset
CRMsoverlappingDichaetebinding
CRMsoverlappingSoxNbinding
CRMsoverlappingDichaeteandSoxNbinding
REDFly 1864 1152(618) 1130(606) 1108(594)FlyLight 7113 2999(422) 2784(391) 2681(378)FlyLightCNS 4727 2077(440) 1935(410) 1857(393)STARRKseqS2 2325 1092(470) 951(409) 912(392)STARRKseqOSC 3341 1144(342) 1061(318) 973(291)
Table3ConservationofSoxbindingintervalsatfunctionalsites
Factor Condition Percentofbindingintervalsconserved
PercentofbindingintervalsuniquetoD)mel
Dichaete REDFly 644 96
NotREDFly 448 276
SoxN REDFly 790 210
NotREDFly 465 535
Dichaete FlyLight 547 167
NotFlyLight 442 286
SoxN FlyLight 653 347
NotFlyLight 451 549
Dichaete Coreinterval 704 77
Notcoreinterval 385 324
SoxN Coreinterval 757 243
Notcoreinterval 447 553
Dichaete Directtarget 537 204
Notdirecttarget 431 290
SoxN Directtarget 533 476
Notdirecttarget 473 527
Table4ConservationofcommonanduniquebindingbyDichaeteandSoxN
Species BindingpatternPercentofbindingintervalsconserved
Percentofbindingintervalsnotconserved
Dmelanogaster Common 765 235
Dichaeteunique 401 599
SoxNunique 337 663
Dsimulans Common 941 59
Dichaeteunique 628 372
SoxNunique 701 299
Table5OverlapbetweentargetsofmouseSoxproteinsandcommonorcoreDichaeteandSoxNtargets
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
51
Mouseprotein
OrthologuesofmousetargetsinD)melanogaster
OverlapwithcommonDichaeteSoxNtargets
OverlapwithcoreSoxNtargets
OverlapwithcoreDichaetetargets
Sox2 1301 589 443 522Sox3 4213 1730 1134 1590Sox11 1485 595 1092 610
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
Figure 1
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
dpp
Figure 2
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
Figure 3
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
Figure 4
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
Figure 5
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
ʹ
ʹ
Figure 6
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
Additional files provided with this submission
Additional file 1 Additional_file_1xls 3529Khttpgenomebiologycomimedia1028053673153906supp1xlsAdditional file 2 Additional_file_2xls 148Khttpgenomebiologycomimedia2126632491153906supp2xlsAdditional file 3 Additional_file_3xls 29Khttpgenomebiologycomimedia4207094221539065supp3xlsAdditional file 4 Additional_file_4xls 533Khttpgenomebiologycomimedia1416949246153906supp4xlsAdditional file 5 Additional_file_5xls 31Khttpgenomebiologycomimedia4839829711539065supp5xlsAdditional file 6 Additional_file_6xls 39Khttpgenomebiologycomimedia1430909301153906supp6xlsAdditional file 7 Additional_file_7xls 11Khttpgenomebiologycomimedia1685101080153906supp7xlsAdditional file 8 Additional_file_8xls 37Khttpgenomebiologycomimedia7888521871539065supp8xlsAdditional file 9 Additional_file_9xls 8Khttpgenomebiologycomimedia7866582071539065supp9xlsAdditional file 10 supp_fig1pdf 442Khttpgenomebiologycomimedia1798206613153906supp10pdfAdditional file 11 supp_fig2pdf 159Khttpgenomebiologycomimedia5093164001539065supp11pdfAdditional file 12 supp_fig3pdf 134Khttpgenomebiologycomimedia5520835271539065supp12pdfAdditional file 13 supp_fig4pdf 135Khttpgenomebiologycomimedia2364524071539065supp13pdf
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
A
B
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
D melanogasterD simulans
D yakubaD pseudoobscura
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
A
B
C
D
E
F
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-
A
B
C
D
CC-BY-ND 40 International licensewas not certified by peer review) is the authorfunder It is made available under aThe copyright holder for this preprint (whichthis version posted December 17 2014 httpsdoiorg101101012872doi bioRxiv preprint
- Start of article
- Figure 1
- Figure 2
- Figure 3
- Figure 4
- Figure 5
- Figure 6
- Additional files
-