natural selection in the great apes - digital...

16
Natural Selection in the Great Apes Alexander Cagan, †,1 Christoph Theunert, †,1,2 Hafid Laayouni, †,3,4 Gabriel Santpere, †,3,5 Marc Pybus, 3 Ferran Casals, 6 Kay Pru ¨fer, 1 Arcadi Navarro, 3,7 Tomas Marques-Bonet, 3,7 Jaume Bertranpetit, ‡,3,8 and Aida M. Andre ´s* ,‡,1 1 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany 2 Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 3 Departament de Ciencies Experimentals i de la Salut, Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain 4 Departament de Gene `tica i de Microbiologia, Universitat Auton oma de Barcelona, Bellaterra, Barcelona, Catalonia, Spain 5 Department of Neuroscience, Yale University School of Medicine, New Haven, CT 6 Genomics Core Facility, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain 7 Instituci o Catalana de Recerca i Estudis Avanc¸ats (ICREA), Barcelona, Catalonia, Spain 8 Department of Archaeology and Anthropology, Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, Cambridge, United Kingdom These authors contributed equally to this work. These authors equally co-supervised this work. *Corresponding author: E-mail: [email protected]. Associate editor: Ryan Hernandez Abstract Natural selection is crucial for the adaptation of populations to their environments. Here, we present the first global study of natural selection in the Hominidae (humans and great apes) based on genome-wide information from popu- lation samples representing all extant species (including most subspecies). Combining several neutrality tests we create a multi-species map of signatures of natural selection covering all major types of natural selection. We find that the estimated efficiency of both purifying and positive selection varies between species and is significantly correlated with their long-term effective population size. Thus, even the modest differences in population size among the closely related Hominidae lineages have resulted in differences in their ability to remove deleterious alleles and to adapt to changing environments. Most signatures of balancing and positive selection are species-specific, with signatures of balancing selection more often being shared among species. We also identify loci with evidence of positive selection across several lineages. Notably, we detect signatures of positive selection in several genes related to brain function, anatomy, diet and immune processes. Our results contribute to a better understanding of human evolution by putting the evidence of natural selection in humans within its larger evolutionary context. The global map of natural selection in our closest living relatives is available as an interactive browser at http://tinyurl.com/nf8qmzh. Key words: evolution, adaptation, comparative genomics, primates. Introduction Understanding the adaptive genetic changes that led to the emergence of modern humans continues to be a major focus of modern genomics (Pritchard et al. 2010; Enard et al. 2014). However, despite much work in this field, many central ques- tions remain unanswered. For example, it is still unclear what percentage of the human genome has been shaped by natural selection, which genetic variants are responsible for the phe- notypes that make humans unique, and to what extent de- mographic factors have influenced the rate of adaptive evolution through human history. These questions can only be answered through a deeper understanding of the evolu- tion both of the human genome and also of other closely related species. While laboratory studies on adaptation in organisms such as Drosophila have furthered our understand- ing of adaptive evolution (Lee et al. 2014), the usefulness of these model organisms for understanding adaptation in hu- mans is limited by the wide disparities that exist between them and humans, in both physiology and demography. Investigation of the molecular basis of adaptation is also hin- dered by differences in the structure and content of the ge- nomes of more distantly related organisms. Studying our closest living relatives, the great apes, is therefore crucial for furthering our understanding of human evolution. The Hominidae (humans and great apes) share several traits that make them particularly interesting. Relative to their ancestors they have evolved larger brains, more complex so- cial systems and, arguably, the ability to create and maintain cultural traditions (McGrew 2004). Furthermore, the Hominidae species differ from one another in important ways (including their morphology, physiology, behavior and Article ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Open Access 3268 Mol. Biol. Evol. 33(12):3268–3283 doi:10.1093/molbev/msw215

Upload: others

Post on 10-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

Natural Selection in the Great Apes

Alexander Cagandagger1 Christoph Theunertdagger12 Hafid Laayounidagger34 Gabriel Santperedagger35 Marc Pybus3

Ferran Casals6 Kay Prufer1 Arcadi Navarro37 Tomas Marques-Bonet37 Jaume BertranpetitDagger38 andAida M AndresDagger1

1Department of Evolutionary Genetics Max Planck Institute for Evolutionary Anthropology Leipzig Germany2Department of Integrative Biology University of California Berkeley Berkeley CA3Departament de Ciencies Experimentals i de la Salut Institut de Biologia Evolutiva Universitat Pompeu Fabra BarcelonaCatalonia Spain4Departament de Genetica i de Microbiologia Universitat Autonoma de Barcelona Bellaterra Barcelona Catalonia Spain5Department of Neuroscience Yale University School of Medicine New Haven CT6Genomics Core Facility Departament de Ciencies Experimentals i de la Salut Universitat Pompeu Fabra Barcelona Catalonia Spain7Institucio Catalana de Recerca i Estudis Avancats (ICREA) Barcelona Catalonia Spain8Department of Archaeology and Anthropology Leverhulme Centre for Human Evolutionary Studies University of CambridgeCambridge United KingdomdaggerThese authors contributed equally to this workDaggerThese authors equally co-supervised this work

Corresponding author E-mail aida_andresevampgdeAssociate editor Ryan Hernandez

Abstract

Natural selection is crucial for the adaptation of populations to their environments Here we present the first globalstudy of natural selection in the Hominidae (humans and great apes) based on genome-wide information from popu-lation samples representing all extant species (including most subspecies) Combining several neutrality tests we create amulti-species map of signatures of natural selection covering all major types of natural selection We find that theestimated efficiency of both purifying and positive selection varies between species and is significantly correlated withtheir long-term effective population size Thus even the modest differences in population size among the closely relatedHominidae lineages have resulted in differences in their ability to remove deleterious alleles and to adapt to changingenvironments Most signatures of balancing and positive selection are species-specific with signatures of balancingselection more often being shared among species We also identify loci with evidence of positive selection across severallineages Notably we detect signatures of positive selection in several genes related to brain function anatomy diet andimmune processes Our results contribute to a better understanding of human evolution by putting the evidence ofnatural selection in humans within its larger evolutionary context The global map of natural selection in our closestliving relatives is available as an interactive browser at httptinyurlcomnf8qmzh

Key words evolution adaptation comparative genomics primates

IntroductionUnderstanding the adaptive genetic changes that led to theemergence of modern humans continues to be a major focusof modern genomics (Pritchard et al 2010 Enard et al 2014)However despite much work in this field many central ques-tions remain unanswered For example it is still unclear whatpercentage of the human genome has been shaped by naturalselection which genetic variants are responsible for the phe-notypes that make humans unique and to what extent de-mographic factors have influenced the rate of adaptiveevolution through human history These questions can onlybe answered through a deeper understanding of the evolu-tion both of the human genome and also of other closelyrelated species While laboratory studies on adaptation inorganisms such as Drosophila have furthered our understand-ing of adaptive evolution (Lee et al 2014) the usefulness of

these model organisms for understanding adaptation in hu-mans is limited by the wide disparities that exist betweenthem and humans in both physiology and demographyInvestigation of the molecular basis of adaptation is also hin-dered by differences in the structure and content of the ge-nomes of more distantly related organisms Studying ourclosest living relatives the great apes is therefore crucial forfurthering our understanding of human evolution

The Hominidae (humans and great apes) share severaltraits that make them particularly interesting Relative to theirancestors they have evolved larger brains more complex so-cial systems and arguably the ability to create and maintaincultural traditions (McGrew 2004) Furthermore theHominidae species differ from one another in importantways (including their morphology physiology behavior and

Article

The Author 2016 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in anymedium provided the original work is properly cited For commercial re-use please contact journalspermissionsoupcom Open Access3268 Mol Biol Evol 33(12)3268ndash3283 doi101093molbevmsw215

life history traits) that may result from their independentadaptation to particular environments Evolutionary genomicinformation can help us to understand the origin and mo-lecular bases of both shared and species-specific traits in theHominidae

The Hominidae also provide an excellent system for com-parative studies This is because although the species are veryclosely related (with all lineages diverging over the last 12 My)they differ substantially in relevant features such as the effec-tive size of their populations (Ne) (Prado-Martinez et al 2013)This makes them well-suited for addressing longstanding the-oretical questions in evolutionary biology A central principleof population genetics is that the effective size of a populationinfluences the efficacy of selection (Charlesworth 2009)Populations with a large Ne are expected to be more efficientat both fixing beneficial alleles and removing deleterious oneswhen compared with populations with small long-term Ne orthat have experienced severe bottlenecks Empirical attemptsto quantify this effect have been limited with exceptions thatinclude work in yeast (Elyashiv et al 2010) Drosophila (Jensenand Bachtrog 2011) and eukaryotes (Grossman et al 2013) aswell as comparisons of very divergent lineages (Corbett-Detiget al 2015) It remains unclear to what extent differences in Nebetween closely related mammalian species impact the pro-cess of natural selection (Ellegren and Galtier 2016) Full ge-nome sequences of humans and great apes provide a uniqueopportunity to investigate this question over a relatively shortevolutionary timescale

The signatures of natural selection have been extensivelystudied in humans (Bustamante et al 2005 Sabeti et al 2006Nielsen et al 2009 Andres et al 2010) and some of the apes(Mikkelsen et al 2005 Locke et al 2011 Prufer et al 2012Scally et al 2012 Bataillon et al 2015 McManus et al 2015)However no study has comprehensively investigated the ev-idence for natural selection across the Hominidae lineagesWe analyzed whole-genome sequence data from multipleindividuals from lineages covering all major Hominidae spe-cies and subspecies (except Gorilla beringei beringei) (Prado-Martinez et al 2013) and present the first investigation of theimpact of natural selection using this dataset We focus onattributes of the data that allow us to detect the differenttypes of selection across evolutionary timescales We thenintegrate these results to investigate the influence of Ne onthe efficacy of natural selection the targeted functional ele-ments the genes and biological processes targeted by eachtype of selection and the conservation of selective pressuresacross Hominidae lineages

Results

Sample ProcessingIn order to assess the influence of natural selection we use adataset of 54 non-human great ape and nine human genomessequenced to an average of 25-fold coverage (supplementarytable S1 Supplementary Material online) Because of differ-ences in demography and selective pressures on autosomesand sex chromosomes we focus exclusively on the autosomes

We take particular care to minimize the influence of errorsand biases in genomic data and ensure that our data is of thehighest possible qualitymdashsomething particularly importantwhen comparing species All reads were mapped to thesame reference genome (human hg18) We built on the ex-tensive data filtering strategy of Prado-Martinez et al (2013)(see ldquoDatasetrdquo in ldquoMethodsrdquo section) This conservative filter-ing strategy resulted in the exclusion of 726 Mb (23) of theautosomal genome This includes tandem repeats (38 Mb)segmental duplications (154 Mb) and structural variantsannotated in at least one species (334 Mb) (see supplementary fig S1 Supplementary Material online) all identified byunusual read-depth so alternative methods (Gokcumen et al2013 Sudmant et al 2015) may identify nonidentical regionsWhile certain genomic regions and gene families may be en-riched in structural variation and be disproportionately af-fected by this filtering step their removal is essential tominimize artifacts We also excluded genomic gaps(226 Mb) and base pairs that were not covered by a min-imum of five reads in all individuals per species The resultingdataset includes on an average 2099 Mb of analyzable ge-nome sequence per species (see supplementary fig S1Supplementary Material online) Although every filteringstrategy has limitations and putative biases we aim for aconservative approach that minimizes the presence of arti-facts The result is a high-quality comparative genomic datasetthat allows us to investigate the signatures of natural selectionand compare them across species (see supplemen-tary materials Sample Processing Supplementary Material online)

Neutrality TestsWe selected a set of neutrality tests that explore differentaspects of the patterns of polymorphism and in combinationallow us to detect the signatures of different types of naturalselection across different time depths from the emergence ofthe Hominidae12 Ma to recent and ongoing species-specificselective sweeps (fig 1) Many neutrality tests exist amongthem we chose those that utilize the type of information thatwe have (ie that do not require phased genomes) that havebeen shown to have high power to detect selection (Zhai et al2009) that explore relatively independent signatures andthat provide information on different timescales To keepthe analyses manageable we focus on four tests (see fig 2)

bull To detect signals of purifying and positive selection onthe coding sequences of proteins we applied theMcDonaldndashKreitman test (MK test McDonald andKreitman 1991) The MK test is run on a protein-coding gene-by-gene basis (supplementary materialsMK 1 Supplementary Material online) By using informa-tion on sequence divergence it has power to detect sig-natures of positive and purifying selection along theentire branch lengths of the Hominidae

bull To detect long-term balancing selection and positive se-lection that could have occurred at a deep evolutionarytime we applied a statistic based on the HudsonndashKreitmanndashAguade test (HKA Hudson et al 1987) whichhas been found to be a highly powerful method to detect

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3269

positive selection (Zhai et al 2009) The HKA statistic wascalculated across the genome in 30-kb windows with a15-kb overlap between windows (Methods and supplementary materials HKA 1 Supplementary Material on-line) As it uses both divergence and diversity data theHKA statistic has power to detect positive selection overbroad timescales as well as long-term balancing selectionincluding persistent balancing selection that predates theemergence of the Hominidae such as on the MHC region(Hedrick 1999)

bull To detect lineage-specific positive selection that occurredafter the divergence of an ancestral population into twospecies we applied the Extended Lineage Sorting test(ELS SOM 13 in Green et al 2010 SupplementaryInformation 7 in Prufer et al 2012 SupplementaryInformation 19a in Prufer et al 2014) The test is runacross the genome and identifies regions without a pre-defined size (Methods and supplementary materials ELS1 Supplementary Material online)

bull To detect recent selective sweeps we applied Fay andWursquos H statistic (FWH Fay and Wu 2000) The FWHstatistic was calculated across the genome in 30-kb win-dows with 15-kb overlap between windows (Methodsand supplementary materials FWH 1 SupplementaryMaterial online)

Together these tests detect the signatures of purifyingbalancing and positive selection old and recent (we refer toevents in the order of millions of years for old and of hundredsof thousands of years up to present day for recent selection)in each lineage (fig 1) Integrating all results provides an un-precedentedly broad picture of the targets of natural selectionin the genomes of humans and great apes

Ne and the Strength of Natural Selection in the GreatApesAs discussed earlier empirical data is limited regarding theeffect of long-term Ne on the efficacy of natural selection overshort evolutionary timescales in vertebrates The relationshipbetween population size selection and levels of neutral diver-sity in populations continues to be a matter of considerabledebate (Ellegren and Galtier 2016) A recent study (Corbett-Detig et al 2015) proposed that the effects of linked selectioncan explain Lewontinrsquos paradox (1974) namely that neutraldiversity does not scale as expected with population sizeThough a recent reanalysis of this data suggests that whilelinked selection influences diversity along genomes fluctua-tions in Ne are the major driver of levels of diversity betweenspecies (Coop 2016) The debate has so far been hampered bythe limited availability of population-level genome sequence

FIG 1 Timescale of neutrality tests Hominidae phylogeny with the approximate time ranges where each neutrality test has power to detectsignatures of natural selection (a) The lineages with the number of genomes used in this study are shown on the right The X-axis shows thetimescale in units of millions of years Split times of lineages from Prado-Martinez et al (2013) For FWH MK and HKA the approximate time rangewhere the tests are inferred to have most power to detect selection are represented by color intensity For ELS we label in green the branches wherethe test has power to detect selection n number of individuals in each lineage Ne estimates of effective population size in units of thousands ofindividuals according to Wattersonrsquos estimator taken from Prado-Martinez et al (2013)

Cagan et al doi101093molbevmsw215 MBE

3270

data across species (Ellegren and Galtier 2016) Our datasettherefore provides an ideal opportunity to investigate therelationship between Ne and selection in closely relatedspecies

We find that the ratio of nonsynonymous to synonymoussubstitutions negatively correlates with long-term Ne in thisdataset (Prado-Martinez et al 2013) as expected with moreefficient purifying selection in populations with higher NeHere we aim to (1) infer the distribution of fitness effects(DFE) in each species (2) quantify the magnitude of the in-fluence of Ne on the DFE (3) compare the influence of long-term versus short-term Ne and (4) investigate its influencenot only on purifying but also on positive selection

Ne and the Strength of Purifying SelectionWe first inferred the DFE of deleterious mutations for 3859one-to-one orthologous protein-coding genes with DFE-alpha (Eyre-Walker and Keightley 2009) which is based onthe MK test (see Methods and MK supplementary materialsSupplementary Material online) The method fits a demo-graphic model to the SFS of neutral sites and simultaneouslyestimates the gamma-distributed DFE of new nonneutral mu-tations and the fraction of adaptive substitutions (a) (supplementary fig S16 Supplementary Material online) For all lin-eages the shape parameter of the gamma distribution is lt1indicative of highly leptokurtic (L shaped) DFEs and mostnonsynonymous mutations being strongly deleteriousIndeed in all lineages the proportion of nonsynonymous mu-tations with a NeSgt 10 (S being the mean homozygous effect

of a deleterious variant) is gt65 (supplementary table S104and fig S18 Supplementary Material online) similar to esti-mates for humans (Eyre-Walker and Keightley 2009) and go-rillas (McManus et al 2015) We observe that the proportionof predicted neutral or nearly neutral mutations correlatesnegatively with long-term Ne (correlation of 064 P val-uefrac14 004 after accounting for phylogenetic nonindependenceusing BayesTraitsV2 random walkmaximum likelihoodmethod Pagel and Meade 2013) This correlation reflectsstronger purifying selection in great ape species with a largerlong-term Ne

Efficient purifying selection reduces also the accumulationof linked genetic variation due to background selectionWithin the bins in the middle range of the HKA distribution(see ldquoMethodsrdquo section) which are particularly sensitive topurifying selection lower HKA scores associate with strongerbackground selection (lower B scores supplementary fig S6Supplementary Material online) and a higher proportion ofprotein-coding exons (fig 3) This is expected if backgroundselection reduces diversity around protein-coding and otherfunctional regions Across lineages and in agreement with theDFE-alpha results the effects of purifying selection increasewith larger Ne both when considering the proportion ofprotein-coding exons and the B scores (supplementary materials HKA 3 and supplementary table S5 SupplementaryMaterial online) (McVicker et al 2009) Incidentally the effectis much weaker for nonprotein coding exons (supplementarytable S5 and supplementary fig 1E Supplementary Materialonline) These results are virtually unchanged if we use onlylineages with less than ten individuals or only lineages with

FIG 2 Summary of the neutrality tests used Each box presents the input (the information used) the analysis strategy (how each test was appliedon the genome-wide data) the pattern (the signatures of selection explored) the criteria to select selection candidates (the top candidates foreach test) and the criteria to select candidates for GO analyses (the candidate used for gene ontology analyses)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3271

more than five individuals suggesting that sample size differ-ences between lineages do not affect our observations (supplementary materials subsampling analysis 14 and supplementary table S106 and supplementary fig S30 SupplementaryMaterial online)

The correlations with Ne above are almost always strongerwith long-term Ne than with recent Ne (for 21 of the 23 HKAbins supplementary table S3 Supplementary Material on-line) indicating that we detect the effects of long-term evo-lutionary history rather than only differences in power due tooverall levels of diversity (although differences in the accuracyof the Ne estimates may affect this comparison) Thereforedespite the recent and ongoing population declines experi-enced by many of these species their long-term Ne appearsto be a better predictor than recent Ne of the past efficacy ofpurifying selection

Ne and Adaptive EvolutionWith the MK-based DFE-alpha it is possible to estimate theproportion (a) of nonsynonymous substitutions driven bypositive selection as well as the ratio of adaptive to neutraldivergence (x(a)) (supplementary table S103Supplementary Material online) With the exceptions ofPongo pygmaeus (with poor bootstrap support) and Pan tschweinfurthii (where two inbred individuals (Prado-Martinez et al 2013) dramatically increase the estimates)(supplementary table S103 Supplementary Material online)both the estimated proportion (a) and the estimated rate ofadaptive evolution are low (0ndash12 and 0ndash2 respectively)

in agreement with previous estimates (Eyre-Walker 2006McManus et al 2015) We observe that in nonhuman greatapes both the proportion and the rate of adaptive substitu-tions are positively correlated with long-term Ne after phy-logenetic nonindependence is accounted for using ageneralized least square approach (supplementary fig S17and supplementary materials MK test 23 SupplementaryMaterial online) The correlation is high and significantwhen all nonhuman species except the problematic Pongopygmaeus and Pan troglodytes schweinfurthii are considered(Pearsonrsquos Rfrac14 09 P valuefrac14 0004) (see fig 4)

The effect of positive selection on linked variation also in-creases with long-term Ne In the bottom bins of the HKAempirical distribution which are enriched in targets of positiveselection the percentage of protein-coding windows correlatespositively with Ne (005ndash1 bins P valuesfrac14 00001ndash002) Onlythe lowest 001 HKA bin is not significant potentially due tothe spatial clustering of windows as a result of selective sweeps(supplementary materials HKA 4 and supplementary table S12Supplementary Material online)

Thus our results indicate that long-term Ne of populationssignificantly affects the efficacy of both purifying and positiveselection These correlations are remarkable because thesespecies are very closely related and their long-term Ne variesby a maximum difference of 3-fold

The Candidate Targets of Natural SelectionAs most genomic sites evolve neutrally or nearly neutrally(Kimura 1979 Kelley et al 2006) we expect an enrichment

FIG 3 Percentage of windows overlapping protein coding exons Percentage of windows overlapping protein coding exons for noncumulative binsof the HKA empirical distribution (X-axis) Each lineage is plotted as a shaded line The Pearsonrsquos correlation (R) between the percentage ofwindows overlapping protein coding exons and Ne within each HKA bin and across all lineages is shown on the right Y-axis Pearsonrsquos correlationcoefficient was computed both with an estimate of short- and long-term Ne (from Prado-Martinez et al 2013) Only R values with significantP values (Plt005) are shown

Cagan et al doi101093molbevmsw215 MBE

3272

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 2: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

life history traits) that may result from their independentadaptation to particular environments Evolutionary genomicinformation can help us to understand the origin and mo-lecular bases of both shared and species-specific traits in theHominidae

The Hominidae also provide an excellent system for com-parative studies This is because although the species are veryclosely related (with all lineages diverging over the last 12 My)they differ substantially in relevant features such as the effec-tive size of their populations (Ne) (Prado-Martinez et al 2013)This makes them well-suited for addressing longstanding the-oretical questions in evolutionary biology A central principleof population genetics is that the effective size of a populationinfluences the efficacy of selection (Charlesworth 2009)Populations with a large Ne are expected to be more efficientat both fixing beneficial alleles and removing deleterious oneswhen compared with populations with small long-term Ne orthat have experienced severe bottlenecks Empirical attemptsto quantify this effect have been limited with exceptions thatinclude work in yeast (Elyashiv et al 2010) Drosophila (Jensenand Bachtrog 2011) and eukaryotes (Grossman et al 2013) aswell as comparisons of very divergent lineages (Corbett-Detiget al 2015) It remains unclear to what extent differences in Nebetween closely related mammalian species impact the pro-cess of natural selection (Ellegren and Galtier 2016) Full ge-nome sequences of humans and great apes provide a uniqueopportunity to investigate this question over a relatively shortevolutionary timescale

The signatures of natural selection have been extensivelystudied in humans (Bustamante et al 2005 Sabeti et al 2006Nielsen et al 2009 Andres et al 2010) and some of the apes(Mikkelsen et al 2005 Locke et al 2011 Prufer et al 2012Scally et al 2012 Bataillon et al 2015 McManus et al 2015)However no study has comprehensively investigated the ev-idence for natural selection across the Hominidae lineagesWe analyzed whole-genome sequence data from multipleindividuals from lineages covering all major Hominidae spe-cies and subspecies (except Gorilla beringei beringei) (Prado-Martinez et al 2013) and present the first investigation of theimpact of natural selection using this dataset We focus onattributes of the data that allow us to detect the differenttypes of selection across evolutionary timescales We thenintegrate these results to investigate the influence of Ne onthe efficacy of natural selection the targeted functional ele-ments the genes and biological processes targeted by eachtype of selection and the conservation of selective pressuresacross Hominidae lineages

Results

Sample ProcessingIn order to assess the influence of natural selection we use adataset of 54 non-human great ape and nine human genomessequenced to an average of 25-fold coverage (supplementarytable S1 Supplementary Material online) Because of differ-ences in demography and selective pressures on autosomesand sex chromosomes we focus exclusively on the autosomes

We take particular care to minimize the influence of errorsand biases in genomic data and ensure that our data is of thehighest possible qualitymdashsomething particularly importantwhen comparing species All reads were mapped to thesame reference genome (human hg18) We built on the ex-tensive data filtering strategy of Prado-Martinez et al (2013)(see ldquoDatasetrdquo in ldquoMethodsrdquo section) This conservative filter-ing strategy resulted in the exclusion of 726 Mb (23) of theautosomal genome This includes tandem repeats (38 Mb)segmental duplications (154 Mb) and structural variantsannotated in at least one species (334 Mb) (see supplementary fig S1 Supplementary Material online) all identified byunusual read-depth so alternative methods (Gokcumen et al2013 Sudmant et al 2015) may identify nonidentical regionsWhile certain genomic regions and gene families may be en-riched in structural variation and be disproportionately af-fected by this filtering step their removal is essential tominimize artifacts We also excluded genomic gaps(226 Mb) and base pairs that were not covered by a min-imum of five reads in all individuals per species The resultingdataset includes on an average 2099 Mb of analyzable ge-nome sequence per species (see supplementary fig S1Supplementary Material online) Although every filteringstrategy has limitations and putative biases we aim for aconservative approach that minimizes the presence of arti-facts The result is a high-quality comparative genomic datasetthat allows us to investigate the signatures of natural selectionand compare them across species (see supplemen-tary materials Sample Processing Supplementary Material online)

Neutrality TestsWe selected a set of neutrality tests that explore differentaspects of the patterns of polymorphism and in combinationallow us to detect the signatures of different types of naturalselection across different time depths from the emergence ofthe Hominidae12 Ma to recent and ongoing species-specificselective sweeps (fig 1) Many neutrality tests exist amongthem we chose those that utilize the type of information thatwe have (ie that do not require phased genomes) that havebeen shown to have high power to detect selection (Zhai et al2009) that explore relatively independent signatures andthat provide information on different timescales To keepthe analyses manageable we focus on four tests (see fig 2)

bull To detect signals of purifying and positive selection onthe coding sequences of proteins we applied theMcDonaldndashKreitman test (MK test McDonald andKreitman 1991) The MK test is run on a protein-coding gene-by-gene basis (supplementary materialsMK 1 Supplementary Material online) By using informa-tion on sequence divergence it has power to detect sig-natures of positive and purifying selection along theentire branch lengths of the Hominidae

bull To detect long-term balancing selection and positive se-lection that could have occurred at a deep evolutionarytime we applied a statistic based on the HudsonndashKreitmanndashAguade test (HKA Hudson et al 1987) whichhas been found to be a highly powerful method to detect

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3269

positive selection (Zhai et al 2009) The HKA statistic wascalculated across the genome in 30-kb windows with a15-kb overlap between windows (Methods and supplementary materials HKA 1 Supplementary Material on-line) As it uses both divergence and diversity data theHKA statistic has power to detect positive selection overbroad timescales as well as long-term balancing selectionincluding persistent balancing selection that predates theemergence of the Hominidae such as on the MHC region(Hedrick 1999)

bull To detect lineage-specific positive selection that occurredafter the divergence of an ancestral population into twospecies we applied the Extended Lineage Sorting test(ELS SOM 13 in Green et al 2010 SupplementaryInformation 7 in Prufer et al 2012 SupplementaryInformation 19a in Prufer et al 2014) The test is runacross the genome and identifies regions without a pre-defined size (Methods and supplementary materials ELS1 Supplementary Material online)

bull To detect recent selective sweeps we applied Fay andWursquos H statistic (FWH Fay and Wu 2000) The FWHstatistic was calculated across the genome in 30-kb win-dows with 15-kb overlap between windows (Methodsand supplementary materials FWH 1 SupplementaryMaterial online)

Together these tests detect the signatures of purifyingbalancing and positive selection old and recent (we refer toevents in the order of millions of years for old and of hundredsof thousands of years up to present day for recent selection)in each lineage (fig 1) Integrating all results provides an un-precedentedly broad picture of the targets of natural selectionin the genomes of humans and great apes

Ne and the Strength of Natural Selection in the GreatApesAs discussed earlier empirical data is limited regarding theeffect of long-term Ne on the efficacy of natural selection overshort evolutionary timescales in vertebrates The relationshipbetween population size selection and levels of neutral diver-sity in populations continues to be a matter of considerabledebate (Ellegren and Galtier 2016) A recent study (Corbett-Detig et al 2015) proposed that the effects of linked selectioncan explain Lewontinrsquos paradox (1974) namely that neutraldiversity does not scale as expected with population sizeThough a recent reanalysis of this data suggests that whilelinked selection influences diversity along genomes fluctua-tions in Ne are the major driver of levels of diversity betweenspecies (Coop 2016) The debate has so far been hampered bythe limited availability of population-level genome sequence

FIG 1 Timescale of neutrality tests Hominidae phylogeny with the approximate time ranges where each neutrality test has power to detectsignatures of natural selection (a) The lineages with the number of genomes used in this study are shown on the right The X-axis shows thetimescale in units of millions of years Split times of lineages from Prado-Martinez et al (2013) For FWH MK and HKA the approximate time rangewhere the tests are inferred to have most power to detect selection are represented by color intensity For ELS we label in green the branches wherethe test has power to detect selection n number of individuals in each lineage Ne estimates of effective population size in units of thousands ofindividuals according to Wattersonrsquos estimator taken from Prado-Martinez et al (2013)

Cagan et al doi101093molbevmsw215 MBE

3270

data across species (Ellegren and Galtier 2016) Our datasettherefore provides an ideal opportunity to investigate therelationship between Ne and selection in closely relatedspecies

We find that the ratio of nonsynonymous to synonymoussubstitutions negatively correlates with long-term Ne in thisdataset (Prado-Martinez et al 2013) as expected with moreefficient purifying selection in populations with higher NeHere we aim to (1) infer the distribution of fitness effects(DFE) in each species (2) quantify the magnitude of the in-fluence of Ne on the DFE (3) compare the influence of long-term versus short-term Ne and (4) investigate its influencenot only on purifying but also on positive selection

Ne and the Strength of Purifying SelectionWe first inferred the DFE of deleterious mutations for 3859one-to-one orthologous protein-coding genes with DFE-alpha (Eyre-Walker and Keightley 2009) which is based onthe MK test (see Methods and MK supplementary materialsSupplementary Material online) The method fits a demo-graphic model to the SFS of neutral sites and simultaneouslyestimates the gamma-distributed DFE of new nonneutral mu-tations and the fraction of adaptive substitutions (a) (supplementary fig S16 Supplementary Material online) For all lin-eages the shape parameter of the gamma distribution is lt1indicative of highly leptokurtic (L shaped) DFEs and mostnonsynonymous mutations being strongly deleteriousIndeed in all lineages the proportion of nonsynonymous mu-tations with a NeSgt 10 (S being the mean homozygous effect

of a deleterious variant) is gt65 (supplementary table S104and fig S18 Supplementary Material online) similar to esti-mates for humans (Eyre-Walker and Keightley 2009) and go-rillas (McManus et al 2015) We observe that the proportionof predicted neutral or nearly neutral mutations correlatesnegatively with long-term Ne (correlation of 064 P val-uefrac14 004 after accounting for phylogenetic nonindependenceusing BayesTraitsV2 random walkmaximum likelihoodmethod Pagel and Meade 2013) This correlation reflectsstronger purifying selection in great ape species with a largerlong-term Ne

Efficient purifying selection reduces also the accumulationof linked genetic variation due to background selectionWithin the bins in the middle range of the HKA distribution(see ldquoMethodsrdquo section) which are particularly sensitive topurifying selection lower HKA scores associate with strongerbackground selection (lower B scores supplementary fig S6Supplementary Material online) and a higher proportion ofprotein-coding exons (fig 3) This is expected if backgroundselection reduces diversity around protein-coding and otherfunctional regions Across lineages and in agreement with theDFE-alpha results the effects of purifying selection increasewith larger Ne both when considering the proportion ofprotein-coding exons and the B scores (supplementary materials HKA 3 and supplementary table S5 SupplementaryMaterial online) (McVicker et al 2009) Incidentally the effectis much weaker for nonprotein coding exons (supplementarytable S5 and supplementary fig 1E Supplementary Materialonline) These results are virtually unchanged if we use onlylineages with less than ten individuals or only lineages with

FIG 2 Summary of the neutrality tests used Each box presents the input (the information used) the analysis strategy (how each test was appliedon the genome-wide data) the pattern (the signatures of selection explored) the criteria to select selection candidates (the top candidates foreach test) and the criteria to select candidates for GO analyses (the candidate used for gene ontology analyses)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3271

more than five individuals suggesting that sample size differ-ences between lineages do not affect our observations (supplementary materials subsampling analysis 14 and supplementary table S106 and supplementary fig S30 SupplementaryMaterial online)

The correlations with Ne above are almost always strongerwith long-term Ne than with recent Ne (for 21 of the 23 HKAbins supplementary table S3 Supplementary Material on-line) indicating that we detect the effects of long-term evo-lutionary history rather than only differences in power due tooverall levels of diversity (although differences in the accuracyof the Ne estimates may affect this comparison) Thereforedespite the recent and ongoing population declines experi-enced by many of these species their long-term Ne appearsto be a better predictor than recent Ne of the past efficacy ofpurifying selection

Ne and Adaptive EvolutionWith the MK-based DFE-alpha it is possible to estimate theproportion (a) of nonsynonymous substitutions driven bypositive selection as well as the ratio of adaptive to neutraldivergence (x(a)) (supplementary table S103Supplementary Material online) With the exceptions ofPongo pygmaeus (with poor bootstrap support) and Pan tschweinfurthii (where two inbred individuals (Prado-Martinez et al 2013) dramatically increase the estimates)(supplementary table S103 Supplementary Material online)both the estimated proportion (a) and the estimated rate ofadaptive evolution are low (0ndash12 and 0ndash2 respectively)

in agreement with previous estimates (Eyre-Walker 2006McManus et al 2015) We observe that in nonhuman greatapes both the proportion and the rate of adaptive substitu-tions are positively correlated with long-term Ne after phy-logenetic nonindependence is accounted for using ageneralized least square approach (supplementary fig S17and supplementary materials MK test 23 SupplementaryMaterial online) The correlation is high and significantwhen all nonhuman species except the problematic Pongopygmaeus and Pan troglodytes schweinfurthii are considered(Pearsonrsquos Rfrac14 09 P valuefrac14 0004) (see fig 4)

The effect of positive selection on linked variation also in-creases with long-term Ne In the bottom bins of the HKAempirical distribution which are enriched in targets of positiveselection the percentage of protein-coding windows correlatespositively with Ne (005ndash1 bins P valuesfrac14 00001ndash002) Onlythe lowest 001 HKA bin is not significant potentially due tothe spatial clustering of windows as a result of selective sweeps(supplementary materials HKA 4 and supplementary table S12Supplementary Material online)

Thus our results indicate that long-term Ne of populationssignificantly affects the efficacy of both purifying and positiveselection These correlations are remarkable because thesespecies are very closely related and their long-term Ne variesby a maximum difference of 3-fold

The Candidate Targets of Natural SelectionAs most genomic sites evolve neutrally or nearly neutrally(Kimura 1979 Kelley et al 2006) we expect an enrichment

FIG 3 Percentage of windows overlapping protein coding exons Percentage of windows overlapping protein coding exons for noncumulative binsof the HKA empirical distribution (X-axis) Each lineage is plotted as a shaded line The Pearsonrsquos correlation (R) between the percentage ofwindows overlapping protein coding exons and Ne within each HKA bin and across all lineages is shown on the right Y-axis Pearsonrsquos correlationcoefficient was computed both with an estimate of short- and long-term Ne (from Prado-Martinez et al 2013) Only R values with significantP values (Plt005) are shown

Cagan et al doi101093molbevmsw215 MBE

3272

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 3: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

positive selection (Zhai et al 2009) The HKA statistic wascalculated across the genome in 30-kb windows with a15-kb overlap between windows (Methods and supplementary materials HKA 1 Supplementary Material on-line) As it uses both divergence and diversity data theHKA statistic has power to detect positive selection overbroad timescales as well as long-term balancing selectionincluding persistent balancing selection that predates theemergence of the Hominidae such as on the MHC region(Hedrick 1999)

bull To detect lineage-specific positive selection that occurredafter the divergence of an ancestral population into twospecies we applied the Extended Lineage Sorting test(ELS SOM 13 in Green et al 2010 SupplementaryInformation 7 in Prufer et al 2012 SupplementaryInformation 19a in Prufer et al 2014) The test is runacross the genome and identifies regions without a pre-defined size (Methods and supplementary materials ELS1 Supplementary Material online)

bull To detect recent selective sweeps we applied Fay andWursquos H statistic (FWH Fay and Wu 2000) The FWHstatistic was calculated across the genome in 30-kb win-dows with 15-kb overlap between windows (Methodsand supplementary materials FWH 1 SupplementaryMaterial online)

Together these tests detect the signatures of purifyingbalancing and positive selection old and recent (we refer toevents in the order of millions of years for old and of hundredsof thousands of years up to present day for recent selection)in each lineage (fig 1) Integrating all results provides an un-precedentedly broad picture of the targets of natural selectionin the genomes of humans and great apes

Ne and the Strength of Natural Selection in the GreatApesAs discussed earlier empirical data is limited regarding theeffect of long-term Ne on the efficacy of natural selection overshort evolutionary timescales in vertebrates The relationshipbetween population size selection and levels of neutral diver-sity in populations continues to be a matter of considerabledebate (Ellegren and Galtier 2016) A recent study (Corbett-Detig et al 2015) proposed that the effects of linked selectioncan explain Lewontinrsquos paradox (1974) namely that neutraldiversity does not scale as expected with population sizeThough a recent reanalysis of this data suggests that whilelinked selection influences diversity along genomes fluctua-tions in Ne are the major driver of levels of diversity betweenspecies (Coop 2016) The debate has so far been hampered bythe limited availability of population-level genome sequence

FIG 1 Timescale of neutrality tests Hominidae phylogeny with the approximate time ranges where each neutrality test has power to detectsignatures of natural selection (a) The lineages with the number of genomes used in this study are shown on the right The X-axis shows thetimescale in units of millions of years Split times of lineages from Prado-Martinez et al (2013) For FWH MK and HKA the approximate time rangewhere the tests are inferred to have most power to detect selection are represented by color intensity For ELS we label in green the branches wherethe test has power to detect selection n number of individuals in each lineage Ne estimates of effective population size in units of thousands ofindividuals according to Wattersonrsquos estimator taken from Prado-Martinez et al (2013)

Cagan et al doi101093molbevmsw215 MBE

3270

data across species (Ellegren and Galtier 2016) Our datasettherefore provides an ideal opportunity to investigate therelationship between Ne and selection in closely relatedspecies

We find that the ratio of nonsynonymous to synonymoussubstitutions negatively correlates with long-term Ne in thisdataset (Prado-Martinez et al 2013) as expected with moreefficient purifying selection in populations with higher NeHere we aim to (1) infer the distribution of fitness effects(DFE) in each species (2) quantify the magnitude of the in-fluence of Ne on the DFE (3) compare the influence of long-term versus short-term Ne and (4) investigate its influencenot only on purifying but also on positive selection

Ne and the Strength of Purifying SelectionWe first inferred the DFE of deleterious mutations for 3859one-to-one orthologous protein-coding genes with DFE-alpha (Eyre-Walker and Keightley 2009) which is based onthe MK test (see Methods and MK supplementary materialsSupplementary Material online) The method fits a demo-graphic model to the SFS of neutral sites and simultaneouslyestimates the gamma-distributed DFE of new nonneutral mu-tations and the fraction of adaptive substitutions (a) (supplementary fig S16 Supplementary Material online) For all lin-eages the shape parameter of the gamma distribution is lt1indicative of highly leptokurtic (L shaped) DFEs and mostnonsynonymous mutations being strongly deleteriousIndeed in all lineages the proportion of nonsynonymous mu-tations with a NeSgt 10 (S being the mean homozygous effect

of a deleterious variant) is gt65 (supplementary table S104and fig S18 Supplementary Material online) similar to esti-mates for humans (Eyre-Walker and Keightley 2009) and go-rillas (McManus et al 2015) We observe that the proportionof predicted neutral or nearly neutral mutations correlatesnegatively with long-term Ne (correlation of 064 P val-uefrac14 004 after accounting for phylogenetic nonindependenceusing BayesTraitsV2 random walkmaximum likelihoodmethod Pagel and Meade 2013) This correlation reflectsstronger purifying selection in great ape species with a largerlong-term Ne

Efficient purifying selection reduces also the accumulationof linked genetic variation due to background selectionWithin the bins in the middle range of the HKA distribution(see ldquoMethodsrdquo section) which are particularly sensitive topurifying selection lower HKA scores associate with strongerbackground selection (lower B scores supplementary fig S6Supplementary Material online) and a higher proportion ofprotein-coding exons (fig 3) This is expected if backgroundselection reduces diversity around protein-coding and otherfunctional regions Across lineages and in agreement with theDFE-alpha results the effects of purifying selection increasewith larger Ne both when considering the proportion ofprotein-coding exons and the B scores (supplementary materials HKA 3 and supplementary table S5 SupplementaryMaterial online) (McVicker et al 2009) Incidentally the effectis much weaker for nonprotein coding exons (supplementarytable S5 and supplementary fig 1E Supplementary Materialonline) These results are virtually unchanged if we use onlylineages with less than ten individuals or only lineages with

FIG 2 Summary of the neutrality tests used Each box presents the input (the information used) the analysis strategy (how each test was appliedon the genome-wide data) the pattern (the signatures of selection explored) the criteria to select selection candidates (the top candidates foreach test) and the criteria to select candidates for GO analyses (the candidate used for gene ontology analyses)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3271

more than five individuals suggesting that sample size differ-ences between lineages do not affect our observations (supplementary materials subsampling analysis 14 and supplementary table S106 and supplementary fig S30 SupplementaryMaterial online)

The correlations with Ne above are almost always strongerwith long-term Ne than with recent Ne (for 21 of the 23 HKAbins supplementary table S3 Supplementary Material on-line) indicating that we detect the effects of long-term evo-lutionary history rather than only differences in power due tooverall levels of diversity (although differences in the accuracyof the Ne estimates may affect this comparison) Thereforedespite the recent and ongoing population declines experi-enced by many of these species their long-term Ne appearsto be a better predictor than recent Ne of the past efficacy ofpurifying selection

Ne and Adaptive EvolutionWith the MK-based DFE-alpha it is possible to estimate theproportion (a) of nonsynonymous substitutions driven bypositive selection as well as the ratio of adaptive to neutraldivergence (x(a)) (supplementary table S103Supplementary Material online) With the exceptions ofPongo pygmaeus (with poor bootstrap support) and Pan tschweinfurthii (where two inbred individuals (Prado-Martinez et al 2013) dramatically increase the estimates)(supplementary table S103 Supplementary Material online)both the estimated proportion (a) and the estimated rate ofadaptive evolution are low (0ndash12 and 0ndash2 respectively)

in agreement with previous estimates (Eyre-Walker 2006McManus et al 2015) We observe that in nonhuman greatapes both the proportion and the rate of adaptive substitu-tions are positively correlated with long-term Ne after phy-logenetic nonindependence is accounted for using ageneralized least square approach (supplementary fig S17and supplementary materials MK test 23 SupplementaryMaterial online) The correlation is high and significantwhen all nonhuman species except the problematic Pongopygmaeus and Pan troglodytes schweinfurthii are considered(Pearsonrsquos Rfrac14 09 P valuefrac14 0004) (see fig 4)

The effect of positive selection on linked variation also in-creases with long-term Ne In the bottom bins of the HKAempirical distribution which are enriched in targets of positiveselection the percentage of protein-coding windows correlatespositively with Ne (005ndash1 bins P valuesfrac14 00001ndash002) Onlythe lowest 001 HKA bin is not significant potentially due tothe spatial clustering of windows as a result of selective sweeps(supplementary materials HKA 4 and supplementary table S12Supplementary Material online)

Thus our results indicate that long-term Ne of populationssignificantly affects the efficacy of both purifying and positiveselection These correlations are remarkable because thesespecies are very closely related and their long-term Ne variesby a maximum difference of 3-fold

The Candidate Targets of Natural SelectionAs most genomic sites evolve neutrally or nearly neutrally(Kimura 1979 Kelley et al 2006) we expect an enrichment

FIG 3 Percentage of windows overlapping protein coding exons Percentage of windows overlapping protein coding exons for noncumulative binsof the HKA empirical distribution (X-axis) Each lineage is plotted as a shaded line The Pearsonrsquos correlation (R) between the percentage ofwindows overlapping protein coding exons and Ne within each HKA bin and across all lineages is shown on the right Y-axis Pearsonrsquos correlationcoefficient was computed both with an estimate of short- and long-term Ne (from Prado-Martinez et al 2013) Only R values with significantP values (Plt005) are shown

Cagan et al doi101093molbevmsw215 MBE

3272

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 4: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

data across species (Ellegren and Galtier 2016) Our datasettherefore provides an ideal opportunity to investigate therelationship between Ne and selection in closely relatedspecies

We find that the ratio of nonsynonymous to synonymoussubstitutions negatively correlates with long-term Ne in thisdataset (Prado-Martinez et al 2013) as expected with moreefficient purifying selection in populations with higher NeHere we aim to (1) infer the distribution of fitness effects(DFE) in each species (2) quantify the magnitude of the in-fluence of Ne on the DFE (3) compare the influence of long-term versus short-term Ne and (4) investigate its influencenot only on purifying but also on positive selection

Ne and the Strength of Purifying SelectionWe first inferred the DFE of deleterious mutations for 3859one-to-one orthologous protein-coding genes with DFE-alpha (Eyre-Walker and Keightley 2009) which is based onthe MK test (see Methods and MK supplementary materialsSupplementary Material online) The method fits a demo-graphic model to the SFS of neutral sites and simultaneouslyestimates the gamma-distributed DFE of new nonneutral mu-tations and the fraction of adaptive substitutions (a) (supplementary fig S16 Supplementary Material online) For all lin-eages the shape parameter of the gamma distribution is lt1indicative of highly leptokurtic (L shaped) DFEs and mostnonsynonymous mutations being strongly deleteriousIndeed in all lineages the proportion of nonsynonymous mu-tations with a NeSgt 10 (S being the mean homozygous effect

of a deleterious variant) is gt65 (supplementary table S104and fig S18 Supplementary Material online) similar to esti-mates for humans (Eyre-Walker and Keightley 2009) and go-rillas (McManus et al 2015) We observe that the proportionof predicted neutral or nearly neutral mutations correlatesnegatively with long-term Ne (correlation of 064 P val-uefrac14 004 after accounting for phylogenetic nonindependenceusing BayesTraitsV2 random walkmaximum likelihoodmethod Pagel and Meade 2013) This correlation reflectsstronger purifying selection in great ape species with a largerlong-term Ne

Efficient purifying selection reduces also the accumulationof linked genetic variation due to background selectionWithin the bins in the middle range of the HKA distribution(see ldquoMethodsrdquo section) which are particularly sensitive topurifying selection lower HKA scores associate with strongerbackground selection (lower B scores supplementary fig S6Supplementary Material online) and a higher proportion ofprotein-coding exons (fig 3) This is expected if backgroundselection reduces diversity around protein-coding and otherfunctional regions Across lineages and in agreement with theDFE-alpha results the effects of purifying selection increasewith larger Ne both when considering the proportion ofprotein-coding exons and the B scores (supplementary materials HKA 3 and supplementary table S5 SupplementaryMaterial online) (McVicker et al 2009) Incidentally the effectis much weaker for nonprotein coding exons (supplementarytable S5 and supplementary fig 1E Supplementary Materialonline) These results are virtually unchanged if we use onlylineages with less than ten individuals or only lineages with

FIG 2 Summary of the neutrality tests used Each box presents the input (the information used) the analysis strategy (how each test was appliedon the genome-wide data) the pattern (the signatures of selection explored) the criteria to select selection candidates (the top candidates foreach test) and the criteria to select candidates for GO analyses (the candidate used for gene ontology analyses)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3271

more than five individuals suggesting that sample size differ-ences between lineages do not affect our observations (supplementary materials subsampling analysis 14 and supplementary table S106 and supplementary fig S30 SupplementaryMaterial online)

The correlations with Ne above are almost always strongerwith long-term Ne than with recent Ne (for 21 of the 23 HKAbins supplementary table S3 Supplementary Material on-line) indicating that we detect the effects of long-term evo-lutionary history rather than only differences in power due tooverall levels of diversity (although differences in the accuracyof the Ne estimates may affect this comparison) Thereforedespite the recent and ongoing population declines experi-enced by many of these species their long-term Ne appearsto be a better predictor than recent Ne of the past efficacy ofpurifying selection

Ne and Adaptive EvolutionWith the MK-based DFE-alpha it is possible to estimate theproportion (a) of nonsynonymous substitutions driven bypositive selection as well as the ratio of adaptive to neutraldivergence (x(a)) (supplementary table S103Supplementary Material online) With the exceptions ofPongo pygmaeus (with poor bootstrap support) and Pan tschweinfurthii (where two inbred individuals (Prado-Martinez et al 2013) dramatically increase the estimates)(supplementary table S103 Supplementary Material online)both the estimated proportion (a) and the estimated rate ofadaptive evolution are low (0ndash12 and 0ndash2 respectively)

in agreement with previous estimates (Eyre-Walker 2006McManus et al 2015) We observe that in nonhuman greatapes both the proportion and the rate of adaptive substitu-tions are positively correlated with long-term Ne after phy-logenetic nonindependence is accounted for using ageneralized least square approach (supplementary fig S17and supplementary materials MK test 23 SupplementaryMaterial online) The correlation is high and significantwhen all nonhuman species except the problematic Pongopygmaeus and Pan troglodytes schweinfurthii are considered(Pearsonrsquos Rfrac14 09 P valuefrac14 0004) (see fig 4)

The effect of positive selection on linked variation also in-creases with long-term Ne In the bottom bins of the HKAempirical distribution which are enriched in targets of positiveselection the percentage of protein-coding windows correlatespositively with Ne (005ndash1 bins P valuesfrac14 00001ndash002) Onlythe lowest 001 HKA bin is not significant potentially due tothe spatial clustering of windows as a result of selective sweeps(supplementary materials HKA 4 and supplementary table S12Supplementary Material online)

Thus our results indicate that long-term Ne of populationssignificantly affects the efficacy of both purifying and positiveselection These correlations are remarkable because thesespecies are very closely related and their long-term Ne variesby a maximum difference of 3-fold

The Candidate Targets of Natural SelectionAs most genomic sites evolve neutrally or nearly neutrally(Kimura 1979 Kelley et al 2006) we expect an enrichment

FIG 3 Percentage of windows overlapping protein coding exons Percentage of windows overlapping protein coding exons for noncumulative binsof the HKA empirical distribution (X-axis) Each lineage is plotted as a shaded line The Pearsonrsquos correlation (R) between the percentage ofwindows overlapping protein coding exons and Ne within each HKA bin and across all lineages is shown on the right Y-axis Pearsonrsquos correlationcoefficient was computed both with an estimate of short- and long-term Ne (from Prado-Martinez et al 2013) Only R values with significantP values (Plt005) are shown

Cagan et al doi101093molbevmsw215 MBE

3272

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 5: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

more than five individuals suggesting that sample size differ-ences between lineages do not affect our observations (supplementary materials subsampling analysis 14 and supplementary table S106 and supplementary fig S30 SupplementaryMaterial online)

The correlations with Ne above are almost always strongerwith long-term Ne than with recent Ne (for 21 of the 23 HKAbins supplementary table S3 Supplementary Material on-line) indicating that we detect the effects of long-term evo-lutionary history rather than only differences in power due tooverall levels of diversity (although differences in the accuracyof the Ne estimates may affect this comparison) Thereforedespite the recent and ongoing population declines experi-enced by many of these species their long-term Ne appearsto be a better predictor than recent Ne of the past efficacy ofpurifying selection

Ne and Adaptive EvolutionWith the MK-based DFE-alpha it is possible to estimate theproportion (a) of nonsynonymous substitutions driven bypositive selection as well as the ratio of adaptive to neutraldivergence (x(a)) (supplementary table S103Supplementary Material online) With the exceptions ofPongo pygmaeus (with poor bootstrap support) and Pan tschweinfurthii (where two inbred individuals (Prado-Martinez et al 2013) dramatically increase the estimates)(supplementary table S103 Supplementary Material online)both the estimated proportion (a) and the estimated rate ofadaptive evolution are low (0ndash12 and 0ndash2 respectively)

in agreement with previous estimates (Eyre-Walker 2006McManus et al 2015) We observe that in nonhuman greatapes both the proportion and the rate of adaptive substitu-tions are positively correlated with long-term Ne after phy-logenetic nonindependence is accounted for using ageneralized least square approach (supplementary fig S17and supplementary materials MK test 23 SupplementaryMaterial online) The correlation is high and significantwhen all nonhuman species except the problematic Pongopygmaeus and Pan troglodytes schweinfurthii are considered(Pearsonrsquos Rfrac14 09 P valuefrac14 0004) (see fig 4)

The effect of positive selection on linked variation also in-creases with long-term Ne In the bottom bins of the HKAempirical distribution which are enriched in targets of positiveselection the percentage of protein-coding windows correlatespositively with Ne (005ndash1 bins P valuesfrac14 00001ndash002) Onlythe lowest 001 HKA bin is not significant potentially due tothe spatial clustering of windows as a result of selective sweeps(supplementary materials HKA 4 and supplementary table S12Supplementary Material online)

Thus our results indicate that long-term Ne of populationssignificantly affects the efficacy of both purifying and positiveselection These correlations are remarkable because thesespecies are very closely related and their long-term Ne variesby a maximum difference of 3-fold

The Candidate Targets of Natural SelectionAs most genomic sites evolve neutrally or nearly neutrally(Kimura 1979 Kelley et al 2006) we expect an enrichment

FIG 3 Percentage of windows overlapping protein coding exons Percentage of windows overlapping protein coding exons for noncumulative binsof the HKA empirical distribution (X-axis) Each lineage is plotted as a shaded line The Pearsonrsquos correlation (R) between the percentage ofwindows overlapping protein coding exons and Ne within each HKA bin and across all lineages is shown on the right Y-axis Pearsonrsquos correlationcoefficient was computed both with an estimate of short- and long-term Ne (from Prado-Martinez et al 2013) Only R values with significantP values (Plt005) are shown

Cagan et al doi101093molbevmsw215 MBE

3272

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 6: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

of targets of natural selection in the extreme tails of thegenome-wide distributions of neutrality test statisticsTherefore we can identify candidate targets of natural selec-tion without relying on a simulated neutral expectation whichis vulnerable to parameter misspecification an importantproblem given the complex evolutionary history of theHominidae lineages Given the little we know about the stron-gest targets of purifying positive and balancing selection innonhuman apes this catalog is highly relevant In additionthese loci allow us to investigate the tempo conservation andbiological function of natural selection in the great apes Thenature of each of the tests considered means that their im-plementation in the genome varies and the criteria to definecandidate targets of selection necessarily varies too (see fig 2)

Sample Size and the Candidate Targets of Natural SelectionSample size which varies among lineages may influence thepower to detect signatures of natural selection We assesshow differences in sample size might influence our resultswith down-sampling analyses We randomly down-sample100 times four or eight individuals from the two lineageswith the largest sample size (Pan paniscus and Gorilla gorillagorilla) and run all neutrality tests for chromosome 1 (exceptthe ELS test for Pan paniscus because this test is not appro-priate for this lineage) We then measure the overlap betweenthe candidates from the down-samples (01 or 1 tail of theempirical distribution) with the equivalent candidates fromthe original results (supplementary materials subsamplinganalysis 1 Supplementary Material online)

The impact of sample size differs between selection testsHKA appears very robust to sample size variation forsignatures of positive selection showing a mean overlap be-tween the original and down-sampled results of at least 86

(supplementary materials subsampling analysis 11 andsupplementary figs S21ndash24 Supplementary Material online)This is likely to be because the HKA is not strongly affected bythe allele frequency of polymorphisms

In contrast FWH and ELS appear more sensitive to samplesize (supplementary materials subsampling analysis 12ndash13and supplementary figs S25ndash28 Supplementary Material on-line) This may be due to the influence of sample size on theestimates of allele frequency Therefore we find that the HKAtest is better suited for comparative analyses where samplesizes are low or unequal between populations

An Available Genome-Wide Map of Natural Selection in

HominidaeThe genome-wide map of signatures of natural selection in-cludes information about different types of selection overvarying time frames As such it provides a broad picture ofthe influence of natural selection in each genomic region andHominidae species All the information is available as an in-teractive browser at webpage httptinyurlcomnf8qmzhfollowing the criteria and configuration of a recently pub-lished human dataset (Pybus et al 2014 2015) The UCSC-style format facilitates the integration with the rich UCSCbrowser tracks a search mask allows easy access to resultsfor specific genes or genomic regions and the raw scores (teststatistic value and rank scoreempirical P value) can be con-veniently downloaded using the UCSC Table function Weexpect this to be a valuable resource for a wide range ofanalyses

The Functional Targets of Natural SelectionThe relative contributions of variants in regulatory versusprotein coding regions of the genome to adaptive evolution

-06

-04

-02

00

02

Alp

ha

10000 15000 20000 25000Ne

PanPanPanTroElPanTroVerGorillaPonAb

R=090P=0004

FIG 4 Correlation between rate of adaptive substitutions (a) and effective population size (Ne) The X-axis shows the effective population size Onthe Y-axis the rate of adaptive substitutions is plotted as a Correlations were calculated while controlling for the phylogenetic nonindependenceusing a generalized least square approach and a random walkmaximum likelihood method (see Supplementary Materials MK 23 SupplementaryMaterial online)

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3273

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 7: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

remains a matter of controversy (Halligan et al 2013) SinceKing and Wilson (1975) the relative importance of coding andregulatory variation to adaptive evolution has been conten-tious Protein-coding DNA constitutes15 of the genomebut 10ndash15 appears to be functionally constrained (Pontingand Hardison 2011) The role of nonprotein-coding genes andother nongenic elements in genome function and evolutionremains debated (Encode Project Consortium 2011 Doolittle2013) with several lines of work suggesting that nongenicregions (including some gene desserts) can play an importantrole in phenotype and adaptation (Bejerano et al 2006Libioulle et al 2007 McPherson et al 2007 Hubisz andPollard 2014) Although the stringent filtering of the dataand the imperfect annotation of nonprotein-coding func-tional elements hampers the comparison of protein-codingversus nonprotein regions we investigated the functional an-notations of our candidates

Except for MK the neutrality tests we used are agnosticabout functional annotation Still most of our candidate tar-gets of positive selection contain functional annotationsmean values across species are 72 for HKA 71 for ELSand 80 for FWH This is significantly greater thangenome-wide expectations based on random sampling ofthe callable genome (P valueslt 005 in all lineages exceptPan paniscus P valuefrac14 02 and Pongo pygmaeus P val-uefrac14 011) (supplementary materials HKA 7 and supplementary figs S7ndash15 Supplementary Material online) Amongthese annotations the overlap with protein-coding exons(HKAfrac14 62 ELSfrac14 59 FWHfrac14 45) is significantly en-riched in all lineages except Pan paniscus (P valuefrac14 015)and Pan troglodytes verus (P valuefrac14 006) In contrast themean overlap with exons from nonprotein coding genes(HKAfrac14 18 ELSfrac14 2 FWHfrac14 20) is not significantly el-evated relative to genome-wide levels (supplementary figsS7ndash15 Supplementary Material online lowest P valuefrac14 016 in Gorilla gorilla gorilla)

Candidate targets of balancing selection are also highlyenriched in protein-coding exons (eg 64 in the top001 bin) and in the top HKA bins this proportion sharplyincreases with the HKA score (fig 3) The increased levels ofdiversity in these windows cannot be explained by technicalartifacts as these regions are not unusual in terms of coverageor mapping quality (supplementary materials HKA 13ndash14Supplementary Material online) or by current models of neu-tral evolution or purifying selection and are instead best ex-plained by long-term balancing selection acting on or nearthese protein-coding exons

The Biological Pathways Targeted by Natural SelectionAccording to our results above protein-coding genes appearto be a key target of natural selection in the Hominidae Wethus investigated the biological functions that these genes areinvolved in For each neutrality test and lineage we identifiedthe genes in candidate regions of positive or balancing selec-tion (see fig 2 and ldquoMethodsrdquo section for details) and per-formed gene enrichment analyses using WebGestalt (Zhanget al 2005) Our necessarily strict data filtering may

disproportionally affect certain Gene Ontology (GO) catego-ries (eg olfactory receptors) but we discuss below the cat-egories that retain the strongest signatures for each type ofnatural selection

Pathways Targeted by Balancing SelectionThe top genes for balancing selection include a number ofwell-established targets such as the major histocompatibilitycomplex (MHC) genes (Hughes and Yeager 1998 Hedrick1999) Indeed windows containing MHC genes appearamong those with the strongest signals of balancing selectionin all lineages (supplementary tables S6 and S13Supplementary Material online) In addition in all lineagesthere is a significant enrichment of immunity-related catego-ries such as the GO ldquoAntigen processing and presentationrdquocategory (closely related to the MHC) (supplementary tablesS16ndash40 Supplementary Material online) This provides evi-dence that balancing selection has a strong influence on im-munological pathways in all lineages

To test whether there were strong signatures of balancingselection beyond the MHC complex we re-ran the GO en-richment analysis excluding all genes in the MHC region onchromosome 6 (supplementary tables S57ndash65Supplementary Material online) Doing so removes the en-richment for the GO category ldquoAntigen processing and pre-sentationrdquo in all lineages Interestingly in three of the four Pantroglodytes lineages (excluding Pan troglodytes schweinfurthii)there is significant enrichment for the GO category ldquoCornifiedenveloperdquo (supplementary tables S360 S62 and S63Supplementary Material online) driven by the three genesSCEL SPRR2B and SPRR2G The cornified envelope is the mostexterior layer of the skin and consists of dead cells Related tothis we note that in Pan troglodytes verus the GO categoryldquokeratinocyte differentiationrdquo involved in the development ofthe most common cell type in the epidermis is also signifi-cantly enriched (P valuefrac14 00026) This is interesting becausekeratins and proteins similarly involved in epithelial barrierformation have been proposed as targets of balancing selec-tion (see ldquoDiscussionrdquo section)

Pathways Targeted by Strong Purifying SelectionSince Hominidae are closely related we would expect thatsimilar regions are evolving under purifying selection Wetherefore tested whether the pathways showing the strongestsignatures of constraint are consistent among lineages Fromthe MK test 53 of the 152 evaluated pathways showed sig-natures of strong purifying selection in more than one lineageIn particular the ldquoIntegrin signalingrdquo pathway and ldquoWnt sig-nalingrdquo pathway which regulate basic cellular and develop-mental processes and the ldquoAlzheimerrsquos disease-presenilinrdquopathway are significantly constrained across all lineages (supplementary table S102 Supplementary Material online)

Pathways Targeted by Positive SelectionFor the HKA several lineages show evidence of positive se-lection targets being enriched for GO categories related toimmune function For example the GO category

Cagan et al doi101093molbevmsw215 MBE

3274

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 8: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

ldquoComplement activationrdquo (genes that activate the innate im-mune system) is significantly enriched in Pan paniscus (Pvaluefrac14 0042 all P values adjusted for multiple testing)whereas the related pathway ldquoComplement receptor activityrdquois enriched in Pongo abelii (P valuefrac14 0011) and the GO cat-egory ldquoViral receptor activityrdquo in Gorilla gorilla gorilla (P val-uefrac14 00004) (supplementary tables S41 S46 and S54Supplementary Material online)

We find that the FWH candidate targets of positive selec-tion show enrichment in several GO categories related tobrain development and function exclusively within theAfrican Hominidae lineages This includes for example theGO categories ldquoDendriterdquo (Homo sapiens P valuefrac14 0040Pan troglodytes troglodytes P valuefrac14 0010 Gorilla gorilla Pvaluefrac14 00024) and ldquoNeuron spinerdquo (Pan troglodytes troglo-dytes P valuefrac14 0001 Gorilla gorilla gorilla P valuefrac14 0006)Several additional neurological categories are enriched in sin-gle lineages (supplementary tables S75ndash86 SupplementaryMaterial online) For example Homo sapiens is the only line-age with significant enrichment of the GO categoryldquoGlutamate receptor activityrdquo (P valuefrac14 0002) glutamateis the main excitatory neurotransmitter in the brain

For the MK the set of genes with an excess of divergence issmall (supplementary table S96 Supplementary Material on-line) but we found a significant enrichment in genes involvedin for instance ldquoIon channel activityrdquo in Pan paniscus (Pvaluefrac140034) and in ldquoGlycosaminoglycan biosynthesisrdquo inGorilla gorilla gorilla (P valuefrac14 0020) among other pathways(supplementary tables S98ndash101 Supplementary Materialonline)

Overlap betweenTargets of Positive and BalancingSelection across LineagesThe Hominidae lineages have shared along their evolutionaryhistory similar physiologies and environments As such theyhave likely been subject to common selective pressures evenafter their lineages split To investigate this possibility weidentified genes that show similar signals of natural selectionin multiple lineages Since we use an empirical approach toidentify candidate targets of natural selection (as the demo-graphic models for these species are not well established) weuse the same 01 cut-off to identify outliers from both tailsof the HKA empirical distribution and one tail of the FWHdistribution Therefore we cannot make general claims aboutthe relative frequency of positive and balancing selection inprimate species We can though explore the level of sharingacross lineages of these candidate targets To be conservativewe only consider genes to be shared targets of selection if theyappear as candidates in at least three lineages

Overlap between Selection TargetsWe find no signals of positive selection that are shared acrossall lineages In fact there is modest sharing across lineages apossible indication of the lineage-specific nature of the adap-tive process (although we note that we are highly conserva-tive in our selection of candidate genes and the power ofFWH is reduced with lower sample sizes) (supplementary

table S15 Supplementary Material online) We observe thatthe HKA candidates show lower sharing across lineages thanthose from the FWH (supplementary tables S14 and S73Supplementary Material online) For the HKA only 27 genes(of the 200 candidates per lineage) are shared in at least threelineages compared with 67 for the FWH which detects morerecent selective events We note that shared signals amongthe Pan troglodytes sub-species may not reflect truly indepen-dent signatures of selection as signals may predate their di-vergence into separate lineages and the possibility ofadmixture between sub-species However of these 67 genesonly a minority (8) are shared exclusively among the Pantroglodytes subspecies potentially reflecting their recentlyshared ancestry The majority (59) show evidence of recentpositive selection across a range of lineages (supplementarytable S73 Supplementary Material online) suggesting puta-tive parallel adaptive events

Turning to the biological function of these genes we findlimited enrichment among HKA targets (the only significantGO category is ldquoStructural molecule activityrdquo P valuefrac14 0009supplementary table S3AAB Supplementary Material online)However the 67 shared FWH candidate genes are significantlyenriched in multiple functional categories (supplementarytables S87ndash89 Supplementary Material online) including sev-eral neuronal pathways suggesting that these are a commontarget of recent positive selection across the Hominidae

Genes targeted by balancing selection show much greatersharing across lineages with 156 genes showing signatures ofbalancing selection across at least three lineages (supplementary table S13 Supplementary Material online fig 5 for anexample across Pan troglodytes lineages) Nine genes primar-ily from the MHC region are shared across all lineages (supplementary table S13 Supplementary Material online) Thislikely reflects the long-term and persistent nature of balancingselection on immunity-related genes in particular in theMHC

DiscussionWe present a global investigation of the signatures of purify-ing positive and balancing selection at different time scalesand across the Hominidae lineages We observe strong evi-dence for the signatures of each of these types of naturalselection on patterns of genomic variation By carefully avoid-ing technical differences we can compare for the first timethe patterns of different types of natural selection across thegreat ape species All genomic analyses of signatures of selec-tion are complicated to some extent by demographic pro-cesses which can result in patterns of genomic variation thatobscure signatures of selection or produce false-positives Wetried to mitigate this by utilizing tests that identified putativelyselected regions as outliers based on the entire distribution ofpatterns of genomic variation under the assumption that themajority of the genome is evolving neutrally

We find that even with the relatively similar Ne of the greatapes (with a maximum difference of 3-fold) Ne has a signif-icant effect on the efficacy of natural selection This appearsto be true for both purifying and positive selection The

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3275

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 9: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

evidence for adaptive evolution is stronger in protein-codingthan in nonprotein coding genes and it is overrepresentednot only on loci with relevance to immune function but alsoon loci involved in the development and maintenance of thebrain Long-term balancing selection which most clearly af-fects the evolution of immune and skin-related loci is moreoften shared across lineages than positive selection In whatfollows we briefly discuss these observations as well as someof the biological insights from the loci identified

Effective Population Size Significantly Influences theEfficacy of Natural Selection in the HominidaeWe estimate that at least 65 of mutations are deleterious(NeSgt 10) in all lineages This estimate agrees very well withresults in humans (Eyre-Walker and Keightley 2009) Our re-sults also agree with a recent study in gorillas (McManus et al2015) where the DFE-alpha method provided a very similargamma shape parameter and proportion of strongly delete-rious and neutral alleles (66 and 238 compared with our65 and 22) Regarding the prevalence of positive selectionour estimates in Gorilla gorilla gorilla (3) also overlap withprevious estimates (McManus et al 2015) Our results thusconfirm the limited information that exists for the Hominidaeand greatly expand upon it

Having information for many great apes enables us to startto compare the different species The strength of purifyingselection on nonsynonymous sites and its effects on linkedvariation correlate with the long-term Ne of the populationsSimilar correlations have been observed among other moredistant species (Leffler et al 2012 Corbett-Detig et al 2015)However our results indicate that even the modest Ne dif-ferences that exist among the Hominidae have also affectedthe efficacy of positive selection which is likely to be lessprevalent and more dependent on environmental changesthan purifying selection Therefore the Hominidae lineageswith the largest long-term effective population sizes suchas Gorilla gorilla gorilla and Pan troglodytes troglodytes are

better able to both remove deleterious alleles and fix adaptivealleles than lineages such as Pan paniscus and Homo sapiensEven the relatively recent differences in long-term Ne be-tween Pan troglodytes sub-species seem to have resulted indifferences in the efficacy of natural selection This may beextremely important because the long-term survival of thesespecies which live in small populations and are endangeredmay depend on their ability to adapt to changes in their localenvironments

Biological Interpretation of Candidate GenesOur data indicates that adaptive evolution (positive and bal-ancing selection) often targets protein-coding regions (bethat the protein-coding sequence or the surrounding regula-tory elements) This suggests that in these species variantsthat affect proteins have been important drivers of noveladaptations It also supports the comparison of protein-coding versus nonprotein coding regions to establish patternsof positive selection as long as additional confounding factorsare accounted for (Coop et al 2009 Key et al 2016)

The candidate genes targeted by positive selection showonly moderate overlap between species This is not surprisingas different populations likely adapt differently at the geneticlevel even to similar environmental pressures Neverthelessthere are certain genes and even gene categories that showevidence of positive selection in several lineages which couldreflect recurrent evolution at the genomic level We note thatthe sharing across lineages is substantially higher when weturn to balancing selection This is expected because we tar-get only long-term balancing selection which may predatethe divergence of the great ape lineages

There are several possible interpretations of shared signalsWhen the signature is shared across closely related lineagesthese most likely reflect shared events When the signature isshared across distant lineages this may reflect independentadaptive evolution In these cases selection may be favoringindependent phenotypes in each species for example if itaffects different neighboring functional elements in each

FIG 5 Venn Diagram of shared targets of balancing and positive selection among Pan troglodytes lineages Overlap of the number of putativetarget genes of balancing and positive selection as inferred by the HKA test for all P troglodytes lineages

Cagan et al doi101093molbevmsw215 MBE

3276

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 10: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

species or due to pleiotropy the same functional element fora different phenotype in each species Alternatively theseregions may represent cases of convergent evolution wherethe same phenotype is selected for across species For exam-ple many genes involved in brain development have sharedevidence for positive selection across different species Wespeculate that there has been ongoing positive selection forneurological phenotypes across the great apes and that al-though this was likely to be highly polygenic some of thesame genes may have been involved across species

The detection of specific genes that have been under adap-tive evolution is of great interest especially when dealing withlineages closely related to humans Here we discuss some ofthe most interesting findings For an extended discussion ofputatively selected genes see supplementary materials sec-tion 7 Supplementary Material online

ImmunityBalancing Selection on the MHCHostndashpathogen co-evolution can result in strong selectivepressures (Anderson and May 1982) In agreement with thiswe find in all lineages evidence of balancing selection main-taining adaptive diversity on immunity-related genes As ex-pected with long-term balancing selection where the time tothe most recent common ancestor may predate the speciessplit many cases are shared among closely related lineagesThese results provide further evidence that advantageous di-versity is extremely important for the immune system as hasbeen shown in humans (Ferrer-Admetlla et al 2008 reviewedin Key et al 2014) Not surprisingly the MHC genes areamong the top candidate targets of balancing selection inall lineages

Balancing Selection on the Skin BarrierThe cornified envelope is a layer of dead keratinocytes (cor-neocytes) that are linked to structural proteins They form aprotective barrier in the outermost layer of the epidermisknown as the stratum corneum which acts as an externalwall that protects the body from physical injury and bacterialinvasion We find that the candidate targets of balancingselection are enriched in genes involved in keratinocyte dif-ferentiation in Pan troglodytes verus They are also enriched inthe related biological process ldquocornified envelope develop-mentrdquo in Pan t ellioti Pan troglodytes troglodytes and Pantroglodytes verus (the genes involved are SCEL SPRR2B andSPRR2G) and include two additional cornified envelope genes(late cornified envelope genes 3D and 3E LCE3D and LCE3E)In LCE3D and LCE3E the signatures are in flanking regions (theprotein-coding portions of the genes are filtered out by thesegmental duplication filter) We also identify CDSN whichencodes corneodesmosin an adhesive protein involved inskin barrier integrity and has previously been shown tohave signatures of balancing selection in humans (Andreset al 2009 Cagliani et al 2011) as a putative target of bal-ancing selection in three lineages (Homo sapiens Pan panis-cus Pan troglodytes verus) (supplementary table S13Supplementary Material online)

A hypothesis for why genes involved in epidermal differ-entiation may evolve under balancing selection has been pro-posed in humans in relation to the filaggrin (FLG) gene (Irvineand McLean 2006) which is essential for the formation of thecornified envelope yet it has two common loss-of-functionsalleles (5 frequency each in Europeans) that cause icthyosisvulgaris and strongly predispose to atopic dermatitis (Irvineand McLean 2006 Smith et al 2006) It has been proposedthat the loss-of-function alleles might result in a leaky skinbarrier through which low levels of pathogens can penetratepromoting innate immunity through a process of ldquonaturalvaccinationrdquo (Irvine and McLean 2006)

Humans homozygous for loss-of-function CDSN alleles fre-quently show skin barrier defects and are susceptible toStaphylococcus aureus superinfections early in life suggestingvariation in the gene influences the ability of pathogens topenetrate the skin barrier (Oji et al 2010) Interestingly het-erozygote carriers of this loss-of-function allele do not presentthese phenotypes suggesting that heterozygotes may obtainbenefits without the deleterious costs of homozygous carriersTherefore a leaky skin barrier that promotes ldquonatural vaccina-tionrdquo may be a hitherto under-appreciated mechanism drivingbalancing selection on a variety of genes involved in develop-ment of the stratum corneum across species We hypothesizethat this mechanism may underlie the strong signatures ofbalancing selection we detect in CDSN (corneodesmosin) andother genes involved in the development of the cornified en-velope The presence of advantageous variation on genes in-volved in the formation of the epithelial barrier may thereforebe more widespread than previously recognized

Positive Selection on HIVSIV-Related GenesWe also find evidence for pervasive positive selection onimmune-related processes as seen in H sapiens and otherHominidae before (Mikkelsen et al 2005 Cagliani et al 2010Casals et al 2011) MK HKA and FWH candidate targets ofpositive selection all are significantly enriched in genes relatedto immune response (supplementary tables S16ndash55 S75ndash86and S99 Supplementary Material online) The particulargenes vary between lineages although some are shared acrosslineages

Immunity-related genes with signals of selection in multi-ple lineages may reveal convergent adaptive response topathogens or adaptive introgression The gene IDO2 is iden-tified as a FWH candidate of recent positive selection in allfour Pan troglodytes lineages and Pan paniscus IDO2 encodesthe enzyme indoleamine 2 3-dioxygenase 2 which is involvedin T-cell regulation and the Tryptophan oxidation pathway(Metz et al 2014) This pathway is activated after HIV infec-tion and causes chronic inflammation (Murray 2010) likelyunderlying HIV-1 immunopathogenesis (Boasso and Shearer2008) Interestingly blocking expression of the IDO genes inrhesus macaques infected with SIVHIV improves health out-comes (Boasso et al 2009) Therefore selection on this func-tional pathway may contribute to the ability of some Pantroglodytes individuals to be resistant to AIDS progressionafter HIV infection which has been attributed to a lack of

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3277

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 11: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

HIV induced T-cell dysfunction (Heeney et al 1993) The MKtest also identifies HIVEP1 as a target of positive selection inPan troglodytes schweinfurthii and Pan paniscus The tran-scription factor encoded by HIVEP1 binds enhancer elementsof several promoters of viruses including HIV-1 Investigationof these selection signals may be of relevance to treating HIVinfections in humans

In summary we find strong evidence of shared signals ofboth balancing and less frequently positive selection ongenes involved in immunity in the Hominidae This likely re-flects the strong and continuous selective pressure that in-fection and disease exerts on these populations and the closeevolutionary history of the Hominidae which results in expo-sure to similar pathogens and similar genetic responses

Neurological FunctionsAll Hominidae lineages are known to possess sophisticatedcognitive abilities (Tomasello and Call 1997 McGrew 2004)related to their increased brain size and changes in brain or-ganization relative to other primates (Semendeferi et al 2002)We find some of the categories with the strongest evidence ofpurifying selection (with MK) are involved in brain function Itis intriguing that the candidate targets of recent positive se-lection are also enriched in neurological functional categorieswith some genes involved in brain development and functionshowing signatures across multiple lineages

The gene with signatures of positive selection across thehighest number of species and timescales is NRXN3 whichcodes for neurexin 3 The gene is mainly expressed in the brainand encodes for a protein involved in synaptic transmissionand plasticity it belongs to a gene family associated with sev-eral cognitive diseases (Sudhof 2008) NRXN3 shows signaturesof positive selection in six lineages with a FWH signal of recentpositive selection in Pan troglodytes ellioti Pan troglodytesschweinfurthii Pan troglodytes troglodytes and Pongo pyg-maeus an HKA signal of positive selection in Homo sapiensand an ELS signature in all lineages where it was performed(Pan troglodytes Gorilla gorilla gorilla and Pongo abelii)Therefore this gene may have been involved in the cognitiveevolution of multiple Hominidae lineages including our own

Several additional prominent candidates of positive selec-tion are involved in cognitive and neurodevelopmental phe-notypes This includes AUTS2 identified by ELS in G g gorilla(second highest rank) and Pan troglodytes troglodytes (fourthhighest rank) and implicated in neuronal development andautism in humans (Oksenberg and Ahituv 2013) In additionCSMD1 the gene with FWH signatures in the most lineages(Homo sapiens Pan paniscus Pan troglodytes ellioti Pan trog-lodytes schweinfurthii Gorilla gorilla gorilla and Pongo pyg-maeus) (supplementary table S73 Supplementary Materialonline) whose function is unknown but that is highly ex-pressed in the central nervous system (Kraus et al 2006)and harbors variants associated with schizophrenia (Haviket al 2011) Further of the four genes with FWH signaturesin five lineages two are associated with neuronal phenotypesKCNIP4 which encodes an A-type potassium channel mod-ulatory protein and harbors variants associated with atten-tion deficit hyperactive disorder (Weiszligflog et al 2013) and

NRG3 (Neuregulin 3) which is crucial in the development ofthe nervous system and whose variants are associated toschizophrenia (Chen et al 2009)

In addition 12 genes detected as positively selected by MKare related to neurodevelopmental disorders in humans (supplementary table S98 Supplementary Material online) Five(MCPH1 CASC5 PHGDH FTO and NBN) can display a phe-notype of microcephaly when mutated (Faheem et al 2015)with mutations in MCPH1 and CASC5 being responsible forautosomal recessive primary microcephaly (MCPH) (Woodset al 2005 Genin et al 2012) MCPH1 (identified here in Hsapiens) has been described as a target of positive selection inprimate evolution (Wang and Su 2004 Shi et al 2013) CASC5(identified here in Pan troglodytes ellioti and Pan paniscus)contains in Homo sapiens a nonsynonymous mutation thatreached fixation since the split with Neandertals (Prufer et al2014) suggesting recent positive selection also in our lineageMCPH1 CENPJ another MCPH gene shows marginallynonsignificant evidence of positive selection (P valuefrac14 0055 in Pt verus) Together these results show putative adap-tive evolution in genes that may have contributed to changesin brain size and function during primate evolution

ConclusionWe present a comparative population genomic analysis thatinvestigates the influence of natural selection across theHominidae This information sheds light on the past adapta-tions of each of these populations As expected immune func-tion was a strong selective force in all species Given the closeevolutionary relationship similar physiology and shared path-ogens of humans with the other Hominidae lineages furtherfunctional study of these immunity-related genes may be ofmedical relevance In addition the evidence of positive selec-tion in neuronal pathways of several lineages suggests differen-tial adaptations in phenotypes that distinguish the Hominidaespecies from one another For example genes that show strongsignals of positive selection solely on the human lineage con-stitute the best candidates to explain human-specific neuro-logical phenotypes Similarly genes with evidence of positiveselection in species that differ from one another in phenotypesincluding size locomotion morphology or diet help us to un-derstand the genetic basis of these adaptations

The fact that even the modest differences in long-term Nebetween Hominidae lineages has had discernible impacts onthe efficacy of natural selection both to remove deleteriousalleles and to favor adaptive ones has additional implicationsThe different great ape species all of which (except for hu-mans) are currently endangered may thus differ significantlyin their ability to adapt to environmental change This mayaffect their ability to adapt not only to constantly changingpathogens but also to the often human-induced changes totheir habitats

Methods

DatasetThe dataset we analyzed consists of whole-genome autoso-mal sequences from 83 individuals across all the major

Cagan et al doi101093molbevmsw215 MBE

3278

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 12: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

lineages of the Hominidae (with the exception of Gorillaberingei beringei) (fig 1 and supplementary table S1Supplementary Material online) The dataset was originallypresented in Prado-Martinez et al (2013 SOM) where theSNP calling pipeline and filtering criteria are described in de-tail All reads are mapped to the human reference genome(hg18) This approach has three main advantages First wetake advantage of the extensive data-quality exploration andfiltering performed in the original publication Second map-ping to the human genome ensures that all species aremapped to a high-quality genome avoiding the (hard to ac-count for) biases that would result from mapping to genomesof low and varying qualities Third the human genome hasthe most comprehensive annotation of gene coding regionswhich is very important in this study

To avoid errors introduced by miss-mapping due to paral-ogous variants we also restricted all analyses to a set of siteswith a unique mapping to the human genome To address thepossible influence of unknown copy number variants (whichwould result in collapsing several genomic regions duringmapping and produce false SNP calls) we took several steps(supplementary fig S1 Supplementary Material online) UsingUCSC tracks we excluded from analysis all repetitive regions(248 Mb) segmental duplications (154 Mb) genomic gaps(226 Mb) and tandem repeats (38 Mb) We also excludedstructural variants detected in any of the great ape lineages(334 Mb) based on the most comprehensive catalogue avail-able which was itself generated using this dataset and read-depth methods (Sudmant et al 2013) Furthermore sites withdepth of coverage (DP)lt (mean read depth80) andDPgt (mean read depth3) were also removed To maximizethe number of sites to be analyzed we excluded multipleindividuals with low coverage (supplementary tables S1 andS2 Supplementary Material online) Additionally we also re-quired positions to have at least 5 coverage in all individualsper species Only the resulting set of sites which we termedldquocallable sitesrdquo were used in further analyses this minimizes asmuch as possible the effects of filtering in all enrichmentanalyses This resulted in a mean of 2099 Mb of analyzablegenome sequence per species (supplementary fig S1Supplementary Material online) We caution that despiteour many efforts which include multiple stringent filteringsteps and the manual curation of targets presented in themain text we cannot discard the presence of some artifactin our data (eg undetected structural variants in the candi-date targets of balancing selection) although we expect thatthem to have a weak influence in our overall results

TestsHudsonndashKreitmanndashAguade Test (HKA)To detect long-term balancing selection and positive selec-tion that could have occurred at a deep evolutionary time-scale we used a statistic based on the HKA test (Hudson et al1987) Here the HKA statistic is simply the ratio of polymor-phic (SNPs) to divergent (substitutions) sites in a window Weconsider as a polymorphism a genomic position that wasidentified as a single nucleotide variant (SNV) in Prado-Martinez et al (2013) We consider a substitution (a divergent

site) a genomic position that is identified as a fixed differencebetween the tested and the outgroup lineage For consis-tency Homo sapiens was used as an outgroup for all lineagesWhen performing the test for Homo sapiens we used thecombined Pan troglodytes lineages as the outgroup

For each lineage the genome was divided into 30-kb geno-mic windows with 15-kb overlap and the HKA statistic wascalculated We consider only windows that contain at least300 callable and 6 informative sites where an informative siteis a SNV or substitution (see supplementary materials HKASupplementary Material online) Each window in the genomewas ranked according to its HKA score and the rank wasconsidered the windowrsquos empirical P value (see supplementary fig S4 Supplementary Material online for an example ofthe distribution of polymorphic sites and substitutions acrossthe HKA empirical distribution) To ensure that our resultswere not influenced by variation in data quality across thegenome we tested whether extreme HKA scores are biased interms of coverage or mapping quality We find no evidence forsuch artifacts influencing our results (see supplementary materials HKA and supplementary figs S2 and S3 SupplementaryMaterial online)

Fay and Wu H Test (FWH)To detect complete or nearly complete positive selectivesweeps caused by recent or ongoing positive selection whichresult in an excess of high-frequency derived alleles we usedthe FWH statistic (Fay and Wu 2000) We confirmed ourimplementation of the FWH statistic was capable of detectingrecent selective sweeps using simulations (see supplementarymaterials FWH 1 and supplementary fig S19 SupplementaryMaterial online) For each lineage the genome was dividedinto 30-kb windows with 15-kb overlap using the same strat-egy as the HKA test (see above and supplementary materialsFWH 1 Supplementary Material online) Windows with lessthan 300 callable sites were removed Each window in thegenome was ranked according to its FWH score and the rankwas considered the windowrsquos empirical P value

McDonaldndashKreitman Test (MK)To detect positive and purifying selection on protein codinggenes we used the McDonaldndashKreitman test (McDonald andKreitman 1991) The MK test was calculated for all lineageswith at least five individuals as this was considered the min-imum sample size for sufficient polymorphism data (supplementary table S93 Supplementary Material online) Only Pantroglodytes troglodytes did not meet these criteria Pan t verusmet the criterion only by including Donald an individualexcluded in all other analyses because of evidence of admix-ture between Pt verus and Pan troglodytes troglodytes (Prado-Martinez et al 2013) Coordinates for coding regions of allautosomal transcript unique identifiers were taken fromRefSeq hg18 and intersected with the callable sites in ourdata (Pruitt et al 2012) This resulted in 151 Mb of codingsequence available for analysis For each lineage we count allpolymorphisms and substitutions that are predicted to haveappeared after the most recent common ancestor with an

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3279

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 13: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

outgroup (Homo sapiens was used for all lineages exceptwhen performing the test for Homo sapiens in which casePan troglodytes was used) (supplementary table S94Supplementary Material online) The total number of tran-scripts tested for each species can be seen in supplementarytable S6C Supplementary Material online and the significanttranscripts for either positive or purifying selection in supplementary tables S96 and S97 Supplementary Material onlineVariants were annotated as either synonymous or nonsynon-ymous using ANNOVAR (Wang et al 2010) Multiallelic siteswere excluded (see supplementary materials MK 14Supplementary Material online)

Extended Lineage Sorting Test (ELS)To detect lineage-specific positive selection that occurredafter the divergence of two closely related lineages we scanthe genome for a signal of extended lineage sorting (seeSOM 13 in Green et al 2010 see SupplementaryInformation 7 in Prufer et al 2012 see SupplementaryInformation 19a in Prufer et al 2014) ie genomic regionswhere the lineage of a closely related outgroup falls basal tothe lineages of a test-group of individuals The test requires aparticular relationship between the test-group and theclosely related outgroup individual where the outgroup in-dividual is sufficiently close and the test-group is sufficientlydiverse so that the outgroup often falls within the diversityof the test-group To determine which population pairs aresuitable for ELS we performed neutral coalescent simula-tions with ms (Hudson 2002) (supplementary table S66Supplementary Material online) The fraction of derived sitesin the simulations was compared with the fraction in thedata which closely matched the simulations in most cases(supplementary fig S20 and supplementary tables S66 andS67 Supplementary Material online) Three lineage pairsshowed a sufficiently close relationship and were used forthe ELS test Pan troglodytesmdashPan paniscus Gorilla g go-rillamdashGorilla b graueri Pongo abeliimdashPongo pygmaeus

We use an implementation of the ELS test that is based ona hidden Markov model (HMM) that analyses SNPs in indi-viduals from one population and the genotype from a singleindividual from the outgroup population (Prufer et al 2014SOM) The HMM then infers the posterior probability for thehidden states internal (the outgroup falls within the diversityof the test group) and external (the outgroup falls basal to thelineages of the test group) at all SNP positions

Following Prufer et al (2012 Supplementary Information7) external regions were defined as a run of SNPs with aprobability ofgt08 for being external that is not interruptedby SNPs with a probability ofgt08 for being internal andscored by their genetic length using the 1-Mb average humanrecombination rate from Kong et al (2002)

For each population the HMM was run repeatedly witheach ldquooutgrouprdquo individual To combine these multiple out-puts we disregarded any external region that was not in thetop 5 of the empirical distribution in all runs (as truly ex-ternal regions are shared among all outgroup individuals) andthe remaining external regions were then assigned a final rank

based on their cumulative rank score from the multiple runs(supplementary materials ELS 13 and supplementary tablesS68ndash70 Supplementary Material online)

Region Annotation and GO Category EnrichmentRegions were annotated as genic (protein-coding andnonprotein coding) if at least 1 bp of the region overlappedwith a gene using GENCODE hg18 gene coordinates (Harrowet al 2012)

To test for evidence of functional enrichment among thegenes that we detect as putative targets of natural selectionwe performed biological category enrichment analysis usingthe software WebGestalt (Zhang et al 2005) For the HKA andFWH tests we selected the 200 genes with the strongest sig-natures of selection as our test set of candidate genes For theELS test we considered all genes in the 5 longest externalregions For the MK test we selected by species all genes withat least one transcript presenting a nominal P value of005

For each test and lineage we tested for functional enrich-ment using several databases of biological pathway and func-tional information GO categories (Harris et al 2004)ldquobiological processesrdquo ldquomolecular functionsrdquo and ldquocellularcomponentsrdquo the Kyoto encyclopedia of genes and genomes(KEGG) pathway database (Kanehisa et al 2004) based onmammalian and human phenotype ontology and thePheWas database which is based on the human PheWasontology (Denny et al 2010) We set a significance thresholdof 005 and used the Bonferroni correction for multiple hy-pothesis testing Significant categories driven by only onegene were discarded due to the high potential for spurioussignals in such cases For HKA results see supplementarytables S3NndashS3AAA Supplementary Material online for ELSsee supplementary tables S71 and S72 SupplementaryMaterial online for FWH see supplementary tables S5CndashS5N Supplementary Material online for MK test see supplementary table S101 Supplementary Material online

We note that all gene pathways used were annotated forhumans While this is not ideal for pathway enrichment anal-yses of nonhuman species the putative biases should be mi-nor This is because these functional elements areevolutionarily constrained and these species are extremelyclosely related (all within only 12 My) For example therehave only been 96 gene-deletion events in the great apes(Prado-Martinez et al 2013) which should have a minimalimpact on an enrichment analyses that uses thousands ofgenes Furthermore any putative annotation errors betweenspecies should be random with respect to biological pathwaysand not systematically biasing gene enrichment results

We tested the potential effect of gene length bias on theresults by repeating the enrichment analyses after randomlyselecting equal numbers of windows and exploring the over-lap of these (random) categories with our results (see supplementary materials HKA 7 Supplementary Material online)

Data AccessAn interactive browser with the signatures of natural selec-tion for each species is available at httptinyurlcomnf8qmzh (last accessed October 10 2016)

Cagan et al doi101093molbevmsw215 MBE

3280

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 14: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

Supplementary MaterialSupplementary figures S1ndashS22 tables S1ndashS107 and supplementary materials are available at Molecular Biology andEvolution online (httpwwwmbeoxfordjournalsorg)

AcknowledgmentsWe thank Matthias Ongyerth for assistance with data prepa-ration Michael Lachmann and Mark Stoneking for discussionsand valuable comments and Michael Lachmann for help withthe ELS test We thank the members of the Great Ape GenomeDiversity Consortium for support throughout this work Thiswork was supported by funding from the Max Planck Societyto KP and AMA by grants from the Ministerio de Economıay Competitividad in Spain (grant BFU2013-43726-P) and theSecretaria drsquoUniversitats i Recerca de la Generalitat deCatalunya (grant GRC 2014 SGR 866) to JB by an EuropeanResearch Council Advanced Grant (233297) to SPeuroaeuroabo andEuropean Research Council Starting Grant (260372) to TMBand by European Molecular Biology Organization YoungInvestigator Award and Ministerio de Ciencia e Innovacionin Spain (BFU2014-55090-P) to TMB

ReferencesAnderson RM May RM 1982 Coevolution of hosts and parasites

Parasitology 85(02)411ndash426Andres AM Dennis MY Kretzschmar WW Cannons JL Lee-Lin S-Q

Hurle B Schwartzberg PL Williamson SH Bustamante CD Nielsen Ret al 2010 Balancing selection maintains a form of ERAP2 thatundergoes nonsense-mediated decay and affects antigen presenta-tion PLoS Genet 6e1001157

Andres AM Hubisz MJ Indap A Torgerson DG Degenhardt JD BoykoAR Gutenkunst RN White TJ Green ED Bustamante CD Clark AG2009 Targets of balancing selection in the human genome Mol BiolEvol 26(12)2755ndash2764

Bataillon T Duan J Hvilsom C Jin X Li Y Skov L Glemin S Munch KJiang T Qian Y Hobolth A 2015 Inference of purifying and positiveselection in three subspecies of chimpanzees (Pan troglodytes) fromexome sequencing Genome Biol Evol 7(4)1122ndash1132

Bejerano G Lowe CB Ahituv N King B Siepel A Salama SR Rubin EMKent WJ Haussler D 2006 A distal enhancer and an ultraconservedexon are derived from a novel retroposon Nature 441(7089)87ndash90

Boasso A Shearer GM 2008 Chronic innate immune activation as acause of HIV-1 immunopathogenesis Clin Immunol 126235ndash242

Boasso A Vaccari M Fuchs D Hardy AW Tsai W-P Tryniszewska EShearer GM Franchini G 2009 Combined effect of antiretroviraltherapy and blockade of IDO in SIV-infected rhesus macaquesJ Immunol 1824313ndash4320

Bustamante CD Fledel-Alon A Williamson S Nielsen R Hubisz MTGlanowski S Tanenbaum DM White TJ Sninsky JJ Hernandez RDCivello D 2005 Natural selection on protein-coding genes in thehuman genome Nature 437(7062)1153ndash1157 20

Cagliani R Riva S Biasin M Fumagalli M Pozzoli U Lo Caputo SMazzotta F Piacentini L Bresolin N Clerici M et al 2010 Geneticdiversity at endoplasmic reticulum aminopeptidases is maintainedby balancing selection and is associated with natural resistance toHIV-1 infection Hum Mol Genet 194705ndash4714

Cagliani R Riva S Pozzoli U Fumagalli M Comi GP Bresolin NClerici M Sironi M 2011 Balancing selection is common inthe extended MHC region but most alleles with opposite riskprofile for autoimmune diseases are neutrally evolving BMCEvol Biol 11(1)171

Casals F Sikora M Laayouni H Montanucci L Muntasell A Lazarus RCalafell F Awadalla P Netea MG Bertranpetit J 2011 Genetic

adaptation of the antibacterial human innate immunity networkBMC Evol Biol 11202

Charlesworth B 2009 Effective population size and patterns of molec-ular evolution and variation Nat Rev Genet 10195ndash205

Chen PL Avramopoulos D Lasseter VK McGrath JA Fallin MD LiangKY Nestadt G Feng N Steel G Cutting AS Wolyniec P 2009 Finemapping on chromosome 10q22-q23 implicates Neuregulin 3 inschizophrenia Am J Hum Genet 8421ndash34

Coop G 2016 Does linked selection explain the narrow range of geneticdiversity across species bioRxiv 1042598

Coop G Pickrell JK Novembre J Kudaravalli S Li J Absher D Myers RMCavalli-Sforza LL Feldman MW Pritchard JK 2009 ldquoThe role of ge-ography in human adaptationrdquo PLoS Genet 5e1000500

Corbett-Detig RB Hartl DL Sackton TB 2015 Natural selection con-strains neutral diversity across a wide range of species PLoS Biol13e1002112

Denny JC Ritchie MD Basford MA Pulley JM Bastarache L Brown-Gentry K Wang D Masys DR Roden DM Crawford DC 2010PheWAS demonstrating the feasibility of a phenome-wide scan todiscover gene-disease associations Bioinformatics 261205ndash1210

Doolittle WF 2013 Is junk DNA bunk A critique of ENCODE Proc NatlAcad Sci 110(14)5294ndash5300

Ellegren H Galtier G 2016 Determinants of genetic diversity Nat RevGenet 17422ndash433

Elyashiv E Bullaughey K Sattath S Rinott Y Przeworski M Sella G 2010Shifts in the intensity of purifying selection an analysis of genome-wide polymorphism data from two closely related yeast speciesGenome Res 201558ndash1573

Enard D Messer PW Petrov DA 2014 Genome-wide signals of positiveselection in human evolution Genome Res 24885ndash895

ENCODE Project Consortium 2011 A userrsquos guide to the encyclopediaof DNA elements (ENCODE) PLoS Biol 9(4)e1001046

Eyre-Walker A 2006 The genomic rate of adaptive evolution Trends EcolEvol 21569ndash575

Eyre-Walker A Keightley PD 2009 Estimating the rate of adaptive mo-lecular evolution in the presence of slightly deleterious mutationsand population size change Mol Biol Evol 262097ndash2108

Faheem M Naseer MI Rasool M Chaudhary AG Kumosani TA IlyasAM Pushparaj P Ahmed F Algahtani HA Al-Qahtani MH et al2015 Molecular genetics of human primary microcephaly an over-view BMC Med Genomics 8(Suppl 1)S4

Fay JC Wu CI 2000 Hitchhiking under positive Darwinian selectionGenetics 1551405ndash1413

Ferrer-Admetlla A Bosch E Sikora M Marques-Bonet T Ramırez-SorianoA Muntasell A Navarro A Lazarus R Calafell F Bertranpetit J et al2008 Balancing selection is the main force shaping the evolution ofinnate immunity genes J Immunol 1811315ndash1322

Genin A Desir J Lambert N Biervliet M Van der Aa N Pierquin G KillianA Tosi M Urbina M Lefort A et al 2012 Kinetochore KMN networkgene CASC5 mutated in primary microcephaly Hum Mol Genet215306ndash5317

Gokcumen O Tischler V Tica J Zhu Q Iskow RC Lee E Fritz MHLangdon A Stutz AM Pavlidis P Benes V 2013 Primate genomearchitecture influences structural variation mechanisms and func-tional consequences Proc Natl Acad Sci 110(39)15764ndash15769

Green RE Krause J Briggs AW Maricic T Stenzel U Kircher M PattersonN Li H Zhai W Fritz MH-Y et al 2010 A draft sequence of theNeandertal genome Science 328710ndash722

Grossman SR Andersen KG Shlyakhter I Tabrizi S Winnicki S Yen APark DJ Griesemer D Karlsson EK Wong SH et al 2013 Identifyingrecent adaptations in large-scale genomic data Cell 152703ndash713

Halligan DL Kousathanas A Ness RW Harr B Eory L Keane TM AdamsDJ Keightley PD 2013 Contributions of protein-coding and regula-tory change to adaptive molecular evolution in murid rodents PLoSGenet 5e1003995

Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R EilbeckK Lewis S Marshall B Mungall C et al 2004 The Gene Ontology(GO) database and informatics resource Nucleic Acids Res32D258ndashD261

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3281

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 15: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

Harrow J Frankish A Gonzalez JM Tapanari E Diekhans M Kokocinski FAken BL Barrell D Zadissa A Searle S et al 2012 GENCODE thereference human genome annotation for the ENCODE projectGenome Res 221760ndash1774

Havik B Le Hellard S Rietschel M Lybaeligk H Djurovic S Mattheisen MMhleisen TW Degenhardt F Priebe L Maier W et al 2011 Thecomplement control-related genes CSMD1 and CSMD2 associateto schizophrenia Biol Psychiatry 7035ndash42

Hedrick PW 1999 Balancing selection and MHC Genetica 104207ndash214Heeney J Jonker R Koornstra W Dubbes R Niphuis H Di Rienzo AM

Gougeon ML Montagnier L 1993 The resistance of HIV-infectedchimpanzees to progression to AIDS correlates with absence of HIV-related T-cell dysfunction J Med Primatol 22194ndash200

Hubisz MJ Pollard KS 2014 Exploring the genesis and functions ofhuman accelerated regions sheds light on their role in human evo-lution Curr Opin Genet Dev 2915ndash21

Hudson RR 2002 Generating samples under a Wright-Fisher neutralmodel of genetic variation Bioinformatics 18337ndash338

Hudson RR Kreitman M Aguade M 1987 A test of neutral molecularevolution based on nucleotide data Genetics 116153ndash159

Hughes AL Yeager M 1998 Natural selection at major histocompatibil-ity complex loci of vertebrates Annu Rev Genet 32415ndash435

Irvine AD McLean WHI 2006 Breaking the (un)sound barrier filaggrin isa major gene for atopic dermatitis J Investig Dermatol1261200ndash1202

Jensen JD Bachtrog D 2011 Characterizing the influence of effectivepopulation size on the rate of adaptation Gillespiersquos Darwin domainGenome Biol Evol 3687ndash701

Kanehisa M Goto S Kawashima S Okuno Y Hattori M 2004 The KEGGresource for deciphering the genome Nucleic Acids Res32D277ndashD280

Kelley JL Madeoy J Calhoun JC Swanson W Akey JM 2006 Genomicsignatures of positive selection in humans and the limits of outlierapproaches Genome Res 16980ndash989

Key FM Fu Q Romagne F Lachmann M Andres AM 2016 Humanadaptation and population differentiation in the light of ancientgenomes Nat Commun187

Key FM Teixeira JC Filippo C de Andres AM 2014 Advantageousdiversity maintained by balancing selection in humans Curr OpinGenet Dev 2945ndash51

Kimura M 1979 The neutral theory of molecular evolution Sci Am24198ndash126

King MC Wilson AC 1975 Evolution at two levels in humans andchimpanzees Science 188107ndash116

Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SARichardsson B Sigurdardottir S Barnard J Hallbeck B Masson Get al 2002 A high-resolution recombination map of the humangenome Nat Genet 31241ndash247

Kraus DM Elliott GS Chute H Horan T Pfenninger KH Sanford SDFoster S Scully S Welcher AA Holers VM 2006 CSMD1 is a novelmultiple domain complement-regulatory protein highly expressedin the central nervous system and epithelial tissues J Immunol1764419ndash4430

Lee Y Langley C Begun D 2014 Differential strengths of positive selec-tion revealed by hitchhiking effects at small physical scales inDrosophila melanogaster Mol Biol Evol 31(4)804ndash816

Leffler EM Bullaughey K Matute DR Meyer WK Segurel L VenkatA Andolfatto P Przeworski M 2012 Revisiting an old riddlewhat determines genetic diversity levels within species PLoSBiol 10(9)e1001388

Lewontin RC 1974 The genetic basis of evolutionary change New YorkColumbia University Press

Libioulle C Louis E Hansoul S Sandor C Farnir F Franchimont DVermeire S Dewit O De Vos M Dixon A Demarche B 2007Novel Crohn disease locus identified by genome-wide associationmaps to a gene desert on 5p13 1 and modulates expression ofPTGER4 PLoS Genet 3(4)e58

Locke DP Hillier LW Warren WC Worley KC Nazareth LV Muzny DMYang SP Wang Z Chinwalla AT Minx P Mitreva M 2011

Comparative and demographic analysis of orang-utan genomesNature 469(7331)529ndash533

McDonald JH Kreitman M 1991 Adaptive protein evolution at the Adhlocus in Drosophila Nature 351652ndash654

McGrew WC 2004 The cultured chimpanzee Reflections on culturalprimatology Cambridge University Press

McManus KF Kelley JL Song S Veeramah KR Woerner AE Stevison LSRyder OA Ape Genome Project G Kidd JM Wall JD et al 2015Inference of gorilla demographic and selective history from whole-genome sequence data Mol Biol Evol 32600ndash612

McPherson R Pertsemlidis A Kavaslar N Stewart A Roberts RCox DR Hinds DA Pennacchio LA Tybjaerg-Hansen AFolsom AR Boerwinkle E 2007 A common allele on chromo-some 9 associated with coronary heart disease Science316(5830)1488ndash1491

McVicker G Gordon D Davis C Green P 2009 Widespread genomicsignatures of natural selection in hominid evolution PLoS Genet5(5)e1000471

Metz R Smith C DuHadaway JB Chandler P Baban B Merlo LMF PigottE Keough MP Rust S Mellor AL et al 2014 IDO2 is critical for IDO1-mediated T-cell regulation and exerts a non-redundant function ininflammation Int Immunol 26357ndash367

Mikkelsen TS Hillier LW Eichler EE Zody MC Jaffe DB Yang S-P EnardW Hellmann I Lindblad-Toh K Altheide TK et al 2005 Initial se-quence of the chimpanzee genome and comparison with the hu-man genome Nature 43769ndash87

Murray MF 2010 Insights into therapy tryptophan oxidation and HIVinfection Sci Transl Med 232ps23

Nielsen R Hubisz MJ Hellmann I Torgerson D Andres AM AlbrechtsenA Gutenkunst R Adams MD Cargill M Boyko A Indap A 2009Darwinian and demographic forces affecting human protein codinggenes Genome Res 19(5)838ndash849

Oji V Eckl KM Aufenvenne K Natebus M Tarinski T Ackermann KSeller N Metze D Nurnberg G Folster-Holst R et al 2010 Loss ofcorneodesmosin leads to severe skin barrier defect pruritus andatopy unraveling the peeling skin disease Am J Hum Genet87(2)274ndash281

Oksenberg N Ahituv N 2013 The role of AUTS2 in neurodevelopmentand human evolution Trends Genet 29600ndash608

Pagel M Meade A (2013) BayesTraits V2 Software and manualReading University of Reading httpwwwevolutionrdgacukBayesTraitsV2Betahtml

Ponting CP Hardison RC 2011 What fraction of the human genome isfunctional Genome Res 21(11)1769ndash1776

Prado-Martinez J Sudmant PH Kidd JM Li H Kelley JL Lorente-Galdos BVeeramah KR Woerner AE OrsquoConnor TD Santpere G et al 2013Great ape genetic diversity and population history Nature499471ndash475

Pritchard JK Pickrell JK Coop G 2010 The genetics of human adapta-tion hard sweeps soft sweeps and polygenic adaptation Curr Biol20(4)R208ndashR215

Prufer K Munch K Hellmann I Akagi K Miller JR Walenz B Koren SSutton G Kodira C Winer R et al 2012 The bonobo genomecompared with the chimpanzee and human genomes Nature486527ndash531

Prufer K Racimo F Patterson N Jay F Sankararaman S Sawyer S HeinzeA Renaud G Sudmant PH de Filippo C et al 2014 The completegenome sequence of a Neanderthal from the Altai MountainsNature 50543ndash49

Pruitt KD Tatusova T Brown GR Maglott DR 2012 NCBI ReferenceSequences (RefSeq) current status new features and genome anno-tation policy Nucleic Acids Res 2(40)D130ndashD135

Pybus M DallrsquoOlio GM Luisi P Uzkudun M Carre~no-Torres A Pavlidis PLaayouni H Bertranpetit J Engelken J 2014 1000 Genomes SelectionBrowser 10 a genome browser dedicated to signatures of naturalselection in modern humans Nucleic Acids Res 42 (Databaseissue)D903ndashD909

Pybus M Luisi P DallrsquoOlio G Uzkudun M Laayouni H Bertranpetit JEngelken J 2015 Hierarchical boosting a machine-learning

Cagan et al doi101093molbevmsw215 MBE

3282

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283

Page 16: Natural Selection in the Great Apes - Digital CSICdigital.csic.es/bitstream/10261/151993/1/msw215.pdfclosest living relatives, the great apes, is therefore crucial for furthering our

framework to detect and classify hard selective sweeps in humanpopulations Bioinformatics 31(24)3946ndash3952

Sabeti PC Schaffner SF Fry B Lohmueller J Varilly P Shamovsky OPalma A Mikkelsen TS Altshuler D Lander ES 2006 Positivenatural selection in the human lineage Science 312 (5780)1614ndash1620

Scally A Dutheil J Hillier L 2012 Insights into hominid evolution fromthe gorilla genome sequence Nature 483169ndash175

Semendeferi K Lu A Schenker N Damasio H 2002 Humans and greatapes share a large frontal cortex Nat Neurosci 5272ndash276

Shi L Li M Lin Q Qi X Su B 2013 Functional divergence of the brain-sizeregulating gene MCPH1 during primate evolution and the origin ofhumans BMC Biol 1162

Smith FJD Irvine AD Terron-Kwiatkowski A Sandilands A Campbell LEZhao Y et al 2006 Loss-of-function mutations in the gene encodingfilaggrin cause ichthyosis vulgaris Nat Genet 38337ndash342

Sudhof TC 2008 Neuroligins and neurexins link synaptic function tocognitive disease Nature 455903ndash911

Sudmant PH Huddleston J Catacchio CR Malig M Hillier LW Baker CMohajeri K Kondova I Bontrop RE Persengiev S Antonacci F 2013Evolution and diversity of copy number variation in the great apelineage Genome Res 23(9)1373ndash1382

Sudmant PH Rausch T Gardner EJ Handsaker RE Abyzov AHuddleston J Zhang Y Ye K Jun G Fritz MH Konkel MK 2015

An integrated map of structural variation in 2504 human genomesNature 526(7571)75ndash81

Tomasello M Call J 1997 Primate cognition USA Oxford UniversityPress

Wang K Li M Hakonarson H 2010 ANNOVAR functional annotationof genetic variants from high-throughput sequencing data NucleicAcids Res 38e164

Wang YQ Su B 2004 Molecular evolution of microcephalin agene determining human brain size Hum Mol Genet 131131ndash1137

Weiszligflog L Scholz CJ Jacob CP Nguyen TT Zamzow K Groszlig-Lesch SRenner TJ Romanos M Rujescu D Walitza S et al 2013 KCNIP4 as acandidate gene for personality disorders and adult ADHD EurNeuropsychopharmacol 23436ndash447

Woods CG Bond J Enard W 2005 Autosomal recessive primary micro-cephaly (MCPH) a review of clinical molecular and evolutionaryfindings Am J Hum Genet 76717ndash728

Zhai W Nielsen R Slatkin M 2009 An investigation of the statisticalpower of neutrality tests based on comparative and populationgenetic data Mol Biol Evol 26(2)273ndash283

Zhang B Kirov S Snoddy J 2005 WebGestalt an integrated systemfor exploring gene sets in various biological contexts Nucleic AcidsRes 1(33)W741ndashW748

Natural Selection in the Great Apes doi101093molbevmsw215 MBE

3283