sequencing y chromosomes resolves discrepancy in time to common ancestor of males versus females
TRANSCRIPT
DOI: 10.1126/science.1237619, 562 (2013);341 Science
et al.G. David PoznikCommon Ancestor of Males Versus FemalesSequencing Y Chromosomes Resolves Discrepancy in Time to
This copy is for your personal, non-commercial use only.
clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others
here.following the guidelines
can be obtained byPermission to republish or repurpose articles or portions of articles
): August 7, 2013 www.sciencemag.org (this information is current as of
The following resources related to this article are available online at
http://www.sciencemag.org/content/341/6145/562.full.htmlversion of this article at:
including high-resolution figures, can be found in the onlineUpdated information and services,
http://www.sciencemag.org/content/suppl/2013/08/01/341.6145.562.DC1.html can be found at: Supporting Online Material
http://www.sciencemag.org/content/341/6145/562.full.html#relatedfound at:
can berelated to this article A list of selected additional articles on the Science Web sites
http://www.sciencemag.org/content/341/6145/562.full.html#ref-list-1, 22 of which can be accessed free:cites 46 articlesThis article
http://www.sciencemag.org/content/341/6145/562.full.html#related-urls1 articles hosted by HighWire Press; see:cited by This article has been
registered trademark of AAAS. is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title
CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
Sequencing Y Chromosomes ResolvesDiscrepancy in Time to CommonAncestor of Males Versus FemalesG. David Poznik,1,2 Brenna M. Henn,3,4 Muh-Ching Yee,3 Elzbieta Sliwerska,5Ghia M. Euskirchen,3 Alice A. Lin,6 Michael Snyder,3 Lluis Quintana-Murci,7,8 Jeffrey M. Kidd,3,5Peter A. Underhill,3 Carlos D. Bustamante3*
The Y chromosome and the mitochondrial genome have been used to estimate when the commonpatrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 malesfrom nine populations, including two in which we find basal branches of the Y-chromosome tree.We identify ancient phylogenetic structure within African haplogroups and resolve a long-standingambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome andthe mitochondrial genome, we estimate the time to the most recent common ancestor (TMRCA) ofthe Y chromosome to be 120 to 156 thousand years and the mitochondrial genome TMRCA tobe 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineagesdo not coalesce significantly more recently than female lineages.
The Y chromosome contains the longeststretch of nonrecombining DNA in thehuman genome and is therefore a pow-
erful tool with which to study human history.Estimates of the time to the most recent commonancestor (TMRCA) of the Y chromosome have dif-fered by a factor of about 2 from TMRCA estimatesfor the mitochondrial genome. Y-chromosomecoalescence time has been estimated in the rangeof 50 to 115 thousand years (ky) (1–3), althoughlarger values have been reported (4, 5), whereasestimates for mitochondrial DNA (mtDNA) rangefrom 150 to 240 ky (3, 6, 7). However, the qualityand quantity of data available for these two uni-parental loci have differed substantially. Whereas
the complete mitochondrial genome has beenresequenced thousands of times (6, 8), fullysequenced diverse Y chromosomes have onlyrecently become available. Previous estimates ofthe Y-chromosome TMRCA relied on short re-sequenced segments, rapidly mutating micro-satellites, or single-nucleotide polymorphisms(SNPs) ascertained in a small panel of individ-uals and then genotyped in a global panel. Theseapproaches likely underestimate genetic diver-sity and, consequently, TMRCA (9).
We sequenced the complete Y chromosomesof 69 males from seven globally diverse pop-ulations of the Human Genome Diversity Panel(HGDP) and two additional African populations:
San (Bushmen) from Namibia, Mbuti Pygmiesfrom the Democratic Republic of Congo, BakaPygmies andNzebi fromGabon,Mozabite Berbersfrom Algeria, Pashtuns (Pathan) from Pakistan,Cambodians, Yakut from Siberia, and Mayansfrom Mexico (fig. S1). Individuals were selectedwithout regard to their Y-chromosome haplogroups.
The Y-chromosome reference sequence is59.36 Mb, but this includes a 30-Mb stretch ofconstitutive heterochromatin on the q arm, a3-Mb centromere, 2.65-Mb and 330-kb telomericpseudoautosomal regions (PAR) that recombinewith the X chromosome, and eight smaller gaps.We mapped reads to the remaining 22.98 Mbof assembled reference sequence, which consistsof three sequence classes defined by their com-plexity and degree of homology to the X chro-mosome (10): X-degenerate, X-transposed, andampliconic. Both the high degree of self-identitywithin the ampliconic tracts and theX-chromosomehomology of the X-transposed region render por-tions of the Y chromosome ill suited for short-readsequencing. To address this, we constructed filtersthat reduced the data to 9.99 million sites (11)
1Program in Biomedical Informatics, Stanford University Schoolof Medicine, Stanford, CA, USA. 2Department of Statistics,StanfordUniversity, Stanford, CA, USA. 3Department of Genetics,Stanford University School of Medicine, Stanford, CA, USA.4Department of Ecology and Evolution, Stony Brook University,Stony Brook, NY, USA. 5Department of Human Genetics andDepartment of Computational Medicine and Bioinformatics,University of Michigan, Ann Arbor, MI, USA. 6Department ofPsychiatry, Stanford University, Stanford, CA, USA. 7InstitutPasteur, Unit of Human Evolutionary Genetics, 75015 Paris,France. 8Centre National de la Recherche Scientifique, URA3012,75015 Paris, France.
*Corresponding author. E-mail: [email protected]
050
100
150
200
250
300
350
400
450
500
Filt
ered
Dep
th E
WM
A
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Position (Mb)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(MQ
0 / U
nfilt
ered
Dep
th)
EW
MA
Depth FilterMQ0 Ratio FilterExclusion MaskInclusion Mask
Compatible SiteIncompatible Site
...
0 Mb 59.36 MbX degenerate X transposed Ampliconic Heterochromatic Pseudoautosomal Other
Fig. 1. Callability mask for the Y chromosome. Exponentially weightedmoving averages of read depth (blue line) and the proportion of readsmapping ambiguously (MQ0 ratio; violet line) versus physical position.Regions with values outside the envelopes defined by the dashed lines(depth) or dotted lines (MQ0) were flagged (blue and violet boxes) andmerged for exclusion (gray boxes). The complement (black boxes) defines
the regions within which reliable genotype calls can be made. Below, ascatter plot indicates the positions of all observed SNVs. Those incom-patible with the inferred phylogenetic tree (red) are uniformly distributed.The X-degenerate regions yield quality sequence data, ampliconic sequencestend to fail both filters, and mapping quality is poor in the X-transposedregion.
2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org562
REPORTS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
(Fig. 1 and fig. S2). We then implemented a hap-loid model expectation-maximization algorithmto call genotypes (11).
We identified 11,640 single-nucleotide vari-ants (SNVs) (fig. S3). A total of 2293 (19.7%)are present in dbSNP (v135), and we assignedhaplogroups on the basis of the 390 (3.4%) presentin the International Society of Genetic Genealogy(ISOGG) database (12) (fig. S4). At SNVs, me-dian haploid coverage was 3.1x (interquartile range2.6 to 3.8x) (table S1 and fig. S5), and sequencevalidation suggests a genotype calling error rateon the order of 0.1% (11).
Because mutations accumulate over timealong a single lengthy haplotype (13), the male-specific region of the Y chromosome providespower for phylogenetic inference. We constructeda maximum likelihood tree from 11,640 SNVsusing the Tamura-Nei nucleotide substitution
model (Fig. 2) and, in agreement with (14), ob-serve strong bootstrap support (500 replicates)for the major haplogroup branching points. Thetree both recapitulates and adds resolution tothe previously inferred Y-chromosome phyloge-ny (fig. S6), and it characterizes branch lengthsfree of ascertainment bias. We identify extra-ordinary depth within Africa, including lineagessampled from the San hunter-gatherers thatcoalesce just short of the root of the entire tree.This stands in contrast to a tree from autosomalSNP genotypes (15), wherein African brancheswere considerably shorter than others; genotyp-ing arrays primarily rely on SNPs ascertained inEuropean populations and therefore undersamplediversity within Africa. Two regions of reducedbranch length in our tree correspond to rapidexpansions: the out-of-Africa event (downstreamof F-M89) and the agriculture-catalyzed Bantu
expansions (downstream of E-M2). Among thethree hunter-gatherer populations, we find a rel-atively high number of B2 lineages. Within thishaplogroup, six Baka B-M192 individuals form adistinct clade that does not correspond to extantdefinitions (11) (fig. S7). We estimate this pre-viously uncharacterized structure to have arisen~35 thousand years ago (kya).
We resolve the polytomy of the Y macro-haplogroup F (16) by determining the branchingorder of haplogroups G, H, and IJK (Fig. 2 andfig. S6).We identified a single variant (rs73614810,a C→T transition dubbed “M578”) for whichhaplogroupG retains the ancestral allele, whereasits brother clades (H and IJK) share the derivedallele. Genotyping M578 in a diverse panel con-firmed the finding (table S2). We thereby infermore recent common ancestry between hgH andhgIJK than between either and hgG. M578 de-
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0
H-M138Cambodian
N-M231Cambodian
E-P59 Nzebi
Q-M3 Maya
E-P116 NzebiE-M191Nzebi
E-P252 Nzebi
B-P70 San
E-U290 Nzebi
B-M192Baka
N-L708 Yakut
E-M183Mozabite
N-L708 Yakut
E-U290 Baka
E-P116 Nzebi
N-L708 Yakut
L-M357 Pashtun
R-L657 Pashtun
E-M154Nzebi
A-P28 San
Q-L54 Maya
B-M192Baka
A-M14 Baka
B-M30 Baka
E-P277 Nzebi
E-M183Mozabite
B-M192Baka
O-Page23 Cambodian
E-P278.1Nzebi
E-P252 Baka
E-P277 Nzebi
E-U290 Nzebi
E-P278.1NzebiE-P277 Nzebi
B-M211Baka
A-M51San
E-P252 Baka
E-M191Nzebi
E-P252 Mbuti
G-M406Pashtun
E-L515 Baka
N-L708 Yakut
E-P252 Baka
E-M183Mozabite
B-M112Baka
B-P6San
B-M211Baka
E-P277 Nzebi
B-M192Baka
A-P262San
G-M377Pashtun
E-P277 Nzebi
B-M109Nzebi
E-P277 Mbuti
E-M183Mozabite
B-M112Baka
B-Page18 Mbuti
B-M192Baka
E-P277 Nzebi
B-P6San
E-P252 Mbuti
B-M192Mbuti
E-P252 Nzebi
B-M30 Baka
B-M192Baka
E-P277 Nzebi
E-P252 Baka
O-M95 Cambodian
B-M112Baka
CT-M168
N-Page56
B-M150
P-M45
O-P186
E-U290
A-M6
B-P6
G-P287
B-M182
E-M2/M180
Q-L54
B-M211
E-M191E-L514
BT-M42E-P179
KxLT-M526
B-M192
E-U175/P277
N-L708
A-M14
B-M30
F-M89
E-M183
E-P252
A-L419
K-M9NO-M214
BE
FT
(No
n-A
fric
an)
AHap
log
rou
ps
HIJK-M578
Fig. 2. Y-chromosome phylogeny inferred from genomic sequencing. Thistree recapitulates the previously known topology of the Y-chromosome phylogeny;however, branch lengths are now free of ascertainment bias. Branches are drawnproportional to the number of derived SNVs. Internal branches are labeled withdefining ISOGG variants inferred to have arisen on the branch. Leaves are colored
by major haplogroup cluster and labeled with the most derived mutation observedand the population from which the individual was drawn. Previously uncharacterizedstructure within African hgB2 is indicated in orange. (Inset) Resolution of apolytomy was possible through the identification of a variant for which hgGretains the ancestral allele, whereas hgH and hgIJK share the derived allele.
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 563
REPORTS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
fines an early diversification episode of the Yphylogeny in Eurasia (11).
To account for missing genotypes, we as-signed each SNV to the root of the smallest sub-tree containing all carriers of one allele or theother and inferred that the allele specific to thesubtree was derived (fig. S8). We used the chim-panzee Y-chromosome sequence to polarize 398variants assigned to the deepest split—a taskcomplicated by substantial structural divergence(11, 17).
We estimated the coalescence time of all Ychromosomes using both amolecular clock–basedfrequentist estimator and an empirical Bayes ap-proach that uses a prior distribution of TMRCA
from coalescent theory and conductsMarkov chainsimulation to estimate the likelihood of param-eters given a set of DNA sequences (GENETREE)(11, 18) (Table 1). To directly compare the TMRCA
of the Y chromosome to that of the mtDNA, weestimated their respective mutation rates by cali-brating phylogeographic patterns from the initialpeopling of the Americas, a recent human eventwith high-confidence archaeological dating.
Archaeological evidence indicates that humansfirst colonized the Americas ~15 kya via a rapidcoastal migration that reached Monte Verde II insouthern Chile by 14.6 kya (19). The two NativeAmericanMayans represent Y-chromosome hgQlineages, Q-M3 and Q-L54*(xM3), that likelydiverged at about the same time as the initialpeopling of the continents. Q is defined by theM242 mutation that arose in Asia. A descendenthaplogroup, Q-L54, emerged in Siberia and isancestral to Q-M3. Because the M3 mutationappears to be specific to the Americas (20), itlikely occurred after the initial entry, and theprevalence of M3 in South America suggeststhat it emerged before the southward migratorywave. Consequently, the divergence betweenthese two lineages provides an appropriate cal-ibration point for the Y mutation rate. The largenumber of variants that have accumulated sincedivergence, 120 and 126, contrasts with thepedigree-based estimate of the Y-chromosomemutation rate, which is based on just 4 mutations(21). Using entry to the Americas as a calibrationpoint, we estimate a mutation rate of 0.82 × 10−9
per base pair (bp) per year [95% confidenceinterval (CI): 0.72 × 10−9 to 0.92 × 10−9/bp/year](table S3). False negatives have minimal effecton this estimate due to the low probability, at5.7x and 8.5x coverage, of observing fewerthan two reads at a site (observed proportions:3.1% and 0.6%) and due to the fact that thenumber of unobserved singletons possessed byone individual is offset by a similar number ofQ doubletons unobserved in the same individualand thereby misclassified as singletons possessedby the other (11) (figs. S9 and S10). This calibra-tion approach assumes approximate coincidencebetween the expansion throughout the Americasand the divergence of Q-M3 and Q-L54*(xM3),but we consider deviation from this assumptionand identify a strict lower bound on the point of
divergence using sequences from the 1000 Ge-nomes Project (11). As a comparison point, weconsider the out-of-Africa expansion of modernhumans, which dates to approximately 50 kya(22) and yields a similar mutation rate of0.79 × 10−9/bp/year.
We constructed an analogous pipeline forhigh coverage (>250x) mtDNA sequences fromthe 69male samples and an additional 24 femalesfrom the seven HGDP populations (11) (fig. S11).As in the Y-chromosome analysis, we calibratedthe mtDNAmutation rate using divergence with-in the Americas. We selected the pan-AmericanhgA2, one of several initial founding haplogroupsamong Native Americans. The star-shaped phy-logeny of hgA2 subclades suggests that its di-vergence was coincident with the rapid dispersalupon the initial colonization of the continents(23). Calibration on 108 previously analyzed hgA2sequences (11) (fig. S12) yields a point estimateequivalent to that fromour sevenMayanmtDNAs,but within a narrower confidence interval. Fromthis within-human calibration, we estimate a mu-tation rate of 2.3 × 10−8/bp/year (95% CI: 2.0 ×10−8 to 2.5 × 10−8/bp/year), higher than that fromhuman-chimpanzee divergence but similar toother estimates using within-human calibrationpoints (24, 25).
The global TMRCA estimate for any locus con-stitutes an upper bound for the time of human
population divergence under models without geneflow. We estimate the Y-chromosome TMRCA
to be 138 ky (120 to 156 ky) and the mtDNATMRCA to be 124 ky (99 to 148 ky) (Table 1) (11).Our mtDNA estimate is more recent than manyprevious studies, the majority of which used mu-tation rates extrapolated from between-speciesdivergence. However, mtDNAmutation rates aresubject to a time-dependent decline, with pedigree-based estimates on the faster end of the spectrumand species-based estimates on the slower. Be-cause of this time dependency and the need tocalibrate the Yand mtDNA in a comparable man-ner, it is more appropriate here to use within-human clade estimates of the mutation rate.
Rather than assume the mutation rate to be aknown constant, we explicitly account for theuncertainty in its estimation by modeling eachTMRCA as the ratio of two random variables.We estimate the ratio of the mtDNA TMRCA tothat of the Y chromosome to be 0.90 (95% CI:0.68 to 1.11) (fig. S13). If, as argued above, thedivergence of the Y-chromosome Q lineagesoccurred at approximately the same time as thatof the mtDNA A2 lineages, then the TMRCA
ratio is invariant to the specific calibration timeused. Regardless, the conclusion of parity isrobust to possible discrepancy between the di-vergence times within the Americas (11). Usingcomparable calibration approaches, the Y and
Table 1. TMRCA and Ne estimates for the Y chromosome and mtDNA. Pop., population.
MethodY chromosome mtDNA
Pop. n TMRCA* Ne Pop. n TMRCA* Ne
Molecular clock All 69 139 (120–156) 4500† All 93 124 (99–148) 9500†
GENETREE‡ San 6 128 (112–146) 3800 Nzebi 18 105 (91–119) 11,500Baka 11 122 (106–137) 1800 Mbuti 6 121 (100–143) 3700
*Employs mutation rate estimated from within-human calibration point. Times measured in ky. †Uses Watterson’sestimator, %qw . ‡Each coalescent analysis restricted to a single population spanning the ancestral root (11).
Fig. 3. Similarity ofTMRCA does not implyequivalent Ne of malesand females. The TMRCAfor a given locus is drawnfrom a predata (i.e., prior)distribution that is a func-tion of Ne, generation time,sample size, and demo-graphic history. Considerthe distribution of possibleTMRCAs for a set of 100uniparental chromosomes.Although the Mbuti mtDNANe is twice as large as thatof the Baka Y chromosome,the corresponding predataTMRCA distributions overlapconsiderably.
0.00
00.
002
0.00
40.
006
0.00
80.
010
Time (ky)
Pro
babi
lity
Den
sity
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org564
REPORTS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
mtDNA coalescence times are not significantlydifferent. This conclusion would hold whetheror not an alternative approach would yield moredefinitive TMRCA estimates.
Our observation that the TMRCA of the Ychromosome is similar to that of the mtDNAdoes not imply that the effective population sizes(Ne) of males and females are similar. In fact,we observe a larger Ne in females than in males(Table 1). Although, due to its larger Ne, the dis-tribution from which the mitochondrial TMRCA
has been drawn is right-shifted with respect tothat of the Y-chromosome TMRCA, the two dis-tributions have large variances and overlap (Fig. 3).
Dogma has held that the common ancestor ofhuman patrilineal lineages, popularly referred toas the Y-chromosome “Adam,” lived considera-bly more recently than the common ancestor offemale lineages, the so-called mitochondrial“Eve.”However, we conclude that the mitochon-drial coalescence time is not substantially greaterthan that of the Y chromosome. Indeed, due toour moderate-coverage sequencing and the ex-istence of additional rare divergent haplogroups,our analysis may yet underestimate the trueY-chromosome TMRCA.
References and Notes1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun,
M. W. Feldman, Mol. Biol. Evol. 16, 1791–1798(1999).
2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner,M. W. Feldman, Proc. Natl. Acad. Sci. U.S.A. 97,7360–7365 (2000).
3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner,M. W. Feldman, Genetics 161, 447–459 (2002).
4. M. F. Hammer, Nature 378, 376–378 (1995).5. F. Cruciani et al., Am. J. Hum. Genet. 88, 814–818
(2011).6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten,
Nature 408, 708–713 (2000).7. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325,
31–36 (1987).8. P. A. Underhill, T. Kivisild, Annu. Rev. Genet. 41,
539–564 (2007).9. M. A. Jobling, C. Tyler-Smith, Nat. Rev. Genet. 4,
598–612 (2003).10. H. Skaletsky et al., Nature 423, 825–837 (2003).11. Materials and methods are available as supplementary
materials on Science Online.12. ISOGG, International Society of Genetic Genealogy
(2013); available at www.isogg.org/.13. P. A. Underhill et al., Ann. Hum. Genet. 65, 43–62 (2001).14. W. Wei et al., Genome Res. 23, 388–395 (2013).15. J. Z. Li et al., Science 319, 1100–1104 (2008).16. T. M. Karafet et al., Genome Res. 18, 830–838 (2008).17. J. F. Hughes et al., Nature 463, 536–539 (2010).18. R. C. Griffiths, S. Tavaré, Philos. Trans. R. Soc. London B
Biol. Sci. 344, 403–410 (1994).19. T. Goebel, M. R. Waters, D. H. O’Rourke, Science 319,
1497–1502 (2008).20. M. C. Dulik et al., Am. J. Hum. Genet. 90, 229–246
(2012).21. Y. Xue et al.; Asan, Curr. Biol. 19, 1453–1457 (2009).22. R. G. Klein, Evol. Anthropol. 17, 267–281 (2008).23. S. Kumar et al., BMC Evol. Biol. 11, 293 (2011).24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond,
Mol. Biol. Evol. 22, 1561–1568 (2005).
25. B. M. Henn, C. R. Gignoux, M. W. Feldman,J. L. Mountain, Mol. Biol. Evol. 26, 217–230 (2009).
Acknowledgments: We thank O. Cornejo, S. Gravel,D. Siegmund, and E. Tsang for helpful discussions; M. Sikoraand H. Costa for mapping reads from Gabonese samples; andH. Cann for assistance with HGDP samples. This work wassupported by National Library of Medicine training grantLM-07033 and NSF graduate research fellowship DGE-1147470(G.D.P.); NIH grant 3R01HG003229 (B.M.H. and C.D.B.);NIH grant DP5OD009154 ( J.M.K. and E.S.); and InstitutPasteur, a CNRS Maladies Infectieuses Émergentes Grant,and a Foundation Simone et Cino del Duca Research Grant(L.Q.M.). P.A.U. consulted for, P.A.U. and B.M.H. have stockin, and C.D.B. is on the advisory board of a project at 23andMe.C.D.B. is on the scientific advisory boards of Personalis, Inc.;InVitae (formerly Locus Development, Inc.); and Ancestry.com.M.S. is a scientific advisory member and founder of Personalis,a scientific advisory member for Genapsys Former, and aconsultant for Illumina and Beckman Coulter Society forAmerican Medical Pathology. B.M.H. formerly had a paidconsulting relationship with Ancestry.com. Variants have beendeposited to dbSNP (ss825679106–825690384). Individuallevel genetic data are available, through a data accessagreement to respect the privacy of the participants fortransfer of genetic data, by contacting C.D.B.
Supplementary Materialswww.sciencemag.org/cgi/content/full/341/6145/562/DC1Materials and MethodsSupplementary TextFigs. S1 to S13Tables S1 to S3Data File S1References (26–51)11 March 2013; accepted 25 June 201310.1126/science.1237619
Low-Pass DNA Sequencing of 1200Sardinians Reconstructs EuropeanY-Chromosome PhylogenyPaolo Francalacci,1* Laura Morelli,1† Andrea Angius,2,3 Riccardo Berutti,3,4 Frederic Reinier,3Rossano Atzeni,3 Rosella Pilu,2 Fabio Busonero,2,5 Andrea Maschio,2,5 Ilenia Zara,3Daria Sanna,1 Antonella Useli,1 Maria Francesca Urru,3 Marco Marcelli,3 Roberto Cusano,3Manuela Oppo,3 Magdalena Zoledziewska,2,4 Maristella Pitzalis,2,4 Francesca Deidda,2,4Eleonora Porcu,2,4,5 Fausto Poddie,4 Hyun Min Kang,5 Robert Lyons,6 Brendan Tarrier,6Jennifer Bragg Gresham,6 Bingshan Li,7 Sergio Tofanelli,8 Santos Alonso,9 Mariano Dei,2Sandra Lai,2 Antonella Mulas,2 Michael B. Whalen,2 Sergio Uzzau,4,10 Chris Jones,3David Schlessinger,11 Gonçalo R. Abecasis,5 Serena Sanna,2 Carlo Sidore,2,4,5 Francesco Cucca2,4*
Genetic variation within the male-specific portion of the Y chromosome (MSY) can clarify theorigins of contemporary populations, but previous studies were hampered by partial geneticinformation. Population sequencing of 1204 Sardinian males identified 11,763 MSY single-nucleotidepolymorphisms, 6751 of which have not previously been observed. We constructed a MSYphylogenetic tree containing all main haplogroups found in Europe, along with manySardinian-specific lineage clusters within each haplogroup. The tree was calibrated witharchaeological data from the initial expansion of the Sardinian population ~7700 years ago.The ages of nodes highlight different genetic strata in Sardinia and reveal the presumptivetiming of coalescence with other human populations. We calculate a putative age for coalescenceof ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–basedestimates.
New sequencing technologies have pro-vided genomic data sets that can recon-struct past events in human evolution
more accurately (1). Sequencing data from themale-specific portion of theY chromosome (MSY)(2), because of its lack of recombination and low
mutation, reversion, and recurrence rates, canbe particularly informative for these evolution-ary analyses (3, 4). Recently, high-coverage Ychromosome sequencing data from 36males fromdifferent worldwide populations (5) assessed6662 phylogenetically informative variants andestimated the timing of past events, including aputative coalescence time for modern humans of~101,000 to 115,000 years ago.
MSY sequencing data reported to date stillrepresent a relatively small number of individualsfrom a few populations. Furthermore, dating esti-mates are also affected by the calibration of the
1Dipartimento di Scienze della Natura e del Territorio, Uni-versità di Sassari, 07100 Sassari, Italy. 2Istituto di RicercaGeneticae Biomedica (IRGB), CNR, Monserrato, Italy. 3Center for Ad-vanced Studies, Research and Development in Sardinia (CRS4),Pula, Italy. 4Dipartimento di Scienze Biomediche, Università diSassari, 07100 Sassari, Italy. 5Center for Statistical Genetics,Department of Biostatistics, University of Michigan, Ann Arbor,MI 48109, USA. 6DNA Sequencing Core, University of Michigan,Ann Arbor, MI 48109, USA. 7Center for Human Genetics Re-search, Department of Molecular Physiology and Biophysics,Vanderbilt University, Nashville, TN 37235, USA. 8Dipartimentodi Biologia, Universitàdi Pisa, 56126 Pisa, Italy. 9Departamentode Genética, Antropología Física y Fisiología Animal, Universi-dad del País Vasco/Euskal Herriko Unibertsitatea, 48080 Bilbao,Spain. 10Porto Conte Ricerche, Località Tramariglio, Alghero,07041 Sassari, Italy. 11Laboratory of Genetics, National Instituteon Aging, Baltimore, MD 21224, USA.
*Corresponding author. E-mail: [email protected] (P.F.);[email protected] (F.C.)†Laura Morelli prematurely passed away on 20 February 2013.This work is dedicated to her memory.
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 565
REPORTS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
www.sciencemag.org/cgi/content/341/6145/562/DC1
Supplementary Materials for
Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females
G. David Poznik, Brenna M. Henn, Muh-Ching Yee, Elzbieta Sliwerska, Ghia M. Euskirchen, Alice A. Lin, Michael Snyder, Lluis Quintana-Murci,
Jeffrey M. Kidd, Peter A. Underhill, Carlos D. Bustamante*
*Corresponding author. E-mail: [email protected]
Published 2 August 2013, Science 341, 562 (2013)
DOI: 10.1126/science.1237619
This PDF file includes:
Materials and Methods Supplementary Text Figs. S1 to S13 Tables S1 to S3 References
Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/341/6245/562/DC1)
Data File S1. Sample, phylogeny, and variant data (zipped archive). Data File S2. Y chromosome genotype calls. To protect participant privacy, this zipped archive is available through a data access agreement (DAA) for transfer of genetic data by contacting C.D.B. Data File S3. Y chromosome mapped sequencing reads. This BAM file is also available via the DAA described above. Mapping, quality score recalibration, and indel realignment are described in Materials and Methods.
2
Table of Contents Materials and Methods .............................................................................................. 4
Sequencing.......................................................................................................................... 4
Genotypes ........................................................................................................................... 4 Validation............................................................................................................................ 5
Phylogenetic Inference........................................................................................................ 5 mtDNA Analysis................................................................................................................. 6
Frequentist Estimation of TMRCA ......................................................................................... 6 Empirical Bayesian Estimation of TMRCA and Ne: GENETREE ......................................... 10
Predata Distribution of TMRCA ........................................................................................... 11
Supplementary Text.................................................................................................. 12
Novel Y Chromosome Phylogenetic Structure................................................................. 12 Imputation ......................................................................................................................... 12
Calibration and Mutation Rate Estimation ....................................................................... 13 Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation..................... 14
Calibration Time ............................................................................................................... 17 Existence of Rare Yet More Basal Lineages .................................................................... 18
Effective Population Size.................................................................................................. 18 Additional Acknowledgements......................................................................................... 18
3
Supplementary Figures Fig. S1. Map of populations. ............................................................................................ 19
Fig. S2. Sequencing read mapping on Xq21. ................................................................... 20 Fig. S3. Quality control and genotype calling on the Y chromosome.............................. 21
Fig. S4. Cross-tabulation of populations and Y haplogroups........................................... 22 Fig. S5. Call rate and mean sequencing coverage on the Y chromosome........................ 23
Fig. S6. Y chromosome phylogenetic backbone. ............................................................. 24 Fig. S7. Novel structure in Y hgB2. ................................................................................. 25
Fig. S8. Phylogeny-aware imputation. ............................................................................. 26 Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included........ 27
Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. .................... 28 Fig. S11. mtDNA phylogeny. ........................................................................................... 29
Fig. S12. mtDNA calibration tree..................................................................................... 30 Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA.................................. 31
Supplementary Tables Table S1. Y chromosome summary of samples............................................................... 32
Table S2. M578 genotyping results. ................................................................................ 34 Table S3. Mutation rate point estimates........................................................................... 36
Supplementary Data Data File S1. Sample, phylogeny, and variant data. ........................................................ 37
Data File S2. Y chromosome genotype calls. .................................................................. 37 Data File S3. Y chromosome mapped sequencing reads. ................................................ 37
FTP Addresses and Accession Numbers for External Data ....................... 38
Y Chromosome hgQ Sequences from the 1000 Genomes Project ................................... 38
Complete mtDNA hgA2 Sequences: GenBank Accession Numbers ............................... 38
References and Notes................................................................................................ 39
4
Materials and Methods
Sequencing We prepared genomic libraries (26) from cell lines (HGDP) and blood (Gabonese), then sequenced the libraries on Illumina HiSeq 2000 machines at the Stanford Center for Genomics and Personalized Medicine. We used BWA (27) to map paired 101 bp reads to the GRCh37 human reference, removed PCR duplicates with Picard (28), and then utilized the Genome Analysis Tool Kit (GATK) (29, 30) to recalibrate quality scores, perform local realignment around candidate indels, and compute genotype likelihoods.
Genotypes Callability Mask To learn directly from the read data the boundaries of the regions within which short-read sequencing could yield reliable variant calls, we calculated average filtered read depth across all samples in contiguous 1 kb windows and computed an exponentially-weighted moving average (EWMA) of these values (Fig. 1). Regions for which the EWMA deviated from a narrow envelope were identified as problematic. Those of depressed depth corresponded to ampliconic sequences, within which reads do not map uniquely and were thus filtered out. Regions of inflated depth corresponded to heterochromatin, where naïve application of standard genotype calling methods would give the impression of abundant heterozygosity due to the pileup of highly similar reads around the borders of unassembled regions. After constructing the depth-based filter, we repeated this procedure for the MQ0 ratio, the proportion of unfiltered reads with fully ambiguous mapping. Although the X-transposed region showed no deviation in the depth-based mask, it failed the MQ0 ratio based mask. In females we found depressed read depth in the homologous region of the X chromosome (Fig. S2); we hypothesize that in males, each of whom possesses one X and one Y, there is an equal exchange of mismapped reads between the two chromosomes. The depth and MQ0 masks were merged and smoothed, leaving 10.45 Mb of sequence for down-stream quality control. Site-Level Quality Control With the regional mask in hand, we defined a series of site-level quality control filters (Fig. S3A). Of the 22,974,737 mapped coordinates, 12,532,580 fell within the bounds of the regional exclusion mask. A further 129,411 were excluded due to an MQ0 ratio greater than or equal to 0.10, and 170,144 were excluded because more than 20 samples had missing genotypes, either due to an absence of sequencing reads or to a heterozygous maximum likelihood genotype (Fig. S3B). The remaining polymorphic sites had a median depth (across all samples) of 265, and we filtered out all sites whose depth was outside three median absolute deviations of this value, thus excluding 12,425 with depth above 371 and 141,512 below 159 (Fig. S3C). Finally, we culled 547 sites with a heterozygous maximum likelihood genotype in more than seven samples (Fig. S3D). This left 9,988,118 callable sites. Of 432 ISOGG SNPs with observed variation in our data,
5
393 pass the regional and mapping quality filters, and of these, just one failed the missingness filter and a further two the depth filter. Genotype Calling To call genotypes, we implemented a haploid model EM algorithm that treated allele frequency as the latent variable and used the homozygous state genotype likelihoods calculated by GATK. Genotypes with a heterozygous maximum likelihood state were classified as missing because calls in such cases were found to be disproportionately incompatible with the inferred phylogeny.
Validation The false positive rate is kept low primarily by the fact that GATK generally requires at least 2 reads of support to identify a site as variable. In addition, we exclude sites incompatible with the phylogeny. Though this filter discards some genuine homoplasic variants, the class is enriched for false positives, and we have chosen to err on the side of conservatism. We consider three means of validation. Sanger Sequencing We validated Y chromosome genotypes for the 29 male HGDP samples at 46 sites using a combination of targeted PCR and Sanger sequencing (3 sites), and exome capture followed by Illumina sequencing (43 sites). Validation failed to yield data for two genotypes, and we compared the remaining 1,245 genotypes to the main data set to find a concordance rate of 99.92%. Just one genotype was discordant (M150, hg19 position 21869519, in HGDP00462). The genotype had zero sequencing reads of support, and the individual had been imputed to carry the reference allele whereas the validation data indicated that this sample actually carries the non-reference allele. Only one other sample, the nearest neighbor to HGDP00462, also carried the non-reference allele, and this illustrates the fact that it is impossible to properly impute missing genotypes for sites otherwise identified as singletons (Supplementary Text, “Imputation” section). Minimally Diverged Samples We also consider private variation among minimally diverged individuals to argue that sequencing errors are minimized in our study. Specifically, we observe a cluster of five Baka hgB2 samples with just a handful of singletons per lineage. This group approximates a replication set and thus gives tight upper bounds on the false positive variant rate. Haplogroup Assignments All HGDP haplogroup assignments were consistent with prior ISOGG designations.
Phylogenetic Inference We used MEGA5 (31) to construct maximum likelihood phylogenetic trees.
6
mtDNA Analysis mtDNA Pipeline To call mitochondrial haplogroups, we converted sequences from the GRCh37 to the rCRS coordinate system and imported to HaploGrep (32), which draws on the Phylotree database (33). We explicitly utilized data presented in Table 1 of Behar et al. (34) to polarize alleles for variants assigned to the most ancient split—that between hgL0 and the rest of the tree (Fig. S11). Whereas the mutation rate on the Y chromosome is sufficiently low that we could regard base substitutions as unique events and simply discard sites that were incompatible with the phylogeny, excluding sites would have been inappropriate for the mitochondrial genome, in which a much higher mutation rate has led to considerable homoplasy. To account for this, we split sites with multiple substitutions into pseudo-sites, each of which constitute a unique event. We discarded a few mutational hotspot sites with evidence for more than four unique substitution events.
Calibration Based on mtDNA hgA2 Since there are far fewer segregating sites in the mitochondrial genome, and we only had seven hgA2 lineages, we used 108 publicly available hgA2 Native American sequences to calibrate. Kumar et al. (23) list 568 accession numbers for mitochondrial genomes, 134 of which belong to hgA2 and are of American descent. We downloaded the subset of 108 entries that included the full mtDNA sequence and, along with the GRCh37 reference sequence, conducted a multiple alignment using MUSCLE (35). We then called haplogroups, built a tree (Fig. S12), assigned variants to branches, and resolved homoplasies as described above.
Frequentist Estimation of TMRCA The Molecular Clock Under the infinite sites model, mutations accumulate in a Poisson process of rate µl, the locus-wide mutation rate. To estimate TMRCA, molecular clock approaches first estimate the mean number of derived mutations per lineage and then divide by an estimate of the mutation rate. For both the Y chromosome and the mtDNA, we estimate TMRCA with:
where D is the sample average of { Di }, the inferred number of mutations accumulated by each lineage since the global MRCA:
T =D
µly,
D =1
n
nX
i=1
Di.
7
We estimated the { Di } using a maximum likelihood phylogeny (Fig. 2), and we estimate the yearly mutation rate, µly, as:
where t is the known TMRCA of the calibration subclade and C is the sample average of { Ci }, the number of derived mutations acquired by each lineage since the common ancestor of the subtree:
Here nc is the number of individuals within the calibration subclade. is therefore a scaled ratio of two random variables:
TMRCA Confidence Intervals From the frequentist perspective, we consider T a fixed but unknown constant, and we are interested in the sampling variance of our estimator conditional on its true value. Since the calibration subtree is a small fraction of the total tree, D and C are approximately uncorrelated. This fact simplifies the expression for the standard deviation of a ratio of random variables, which is obtained using the δ method (36):
Since both D and C are sums of Poisson random variables with a large number of total events, each is well approximated by the normal distribution. Consequently, their ratio is also approximately normally distributed (37). Therefore, if we are able to compute σD|T and σC, we can construct a confidence interval for T. We first consider σD|T. The { Di } are identically Poisson distributed, but they are not independent due to the shared internal branches (3). Thus,
Since each Di is a Poisson random variable, its variance is equal to its mean. Now consider samples i and j. The numbers of mutations that have accumulated in each since
µly =C
t,
C =1
nc
ncX
i=1
Ci.
€
ˆ T
T = tD
C.
�T |T ⇡t
C
s✓D
C�C
◆2
+ �2D|T .
�2D|T = Var[D|T ] =
1
n2
"X
i
Var[Di|T ] + 2 ·X
i
X
j>i
Cov [Di, Dj|T ]
#.
8
their MRCA are independent. However, they share all mutations possessed by their MRCA. Thus,
where Dij is the number of derived variants possessed by the common ancestor of i and j. Let I denote the set of internal branches, and let bs and bl be the number of descendants and the length of a branch, b, respectively. Each internal branch will be shared by bs choose 2 pairs of individuals. Thus,
which gives:
An identical argument applies to σC within the calibration subtree. We, therefore, construct a 95% confidence interval for TMRCA as:
The bias of the point estimator is minimal (36). Precision of TMRCA Estimation The standard error for the mean estimate of a Poisson random variable with mean µlT is
€
µlT n , so the coefficient of variation (the ratio of the standard error to the mean) declines in proportion to
€
nµlT . On the Y chromosome, T is large and, because the non-recombining locus is so long, µl is quite large as well. Consequently, the standard error for estimating the mean branch length is relatively small, and the greater source of uncertainty lies in estimating the mutation rate, where the time intervals over which mutations have accumulated are shorter, and the number of lineages is smaller. However, µl is sufficiently large that we could derive a narrow confidence interval based solely on the two hgQ lineages we had sequenced. In contrast, for the mtDNA, the uncertainty due to σD|T exceeds that due to σC. An Alternative Frequentist Estimator
Cov [Di, Dj|T ] = Dij,
2 ·X
i
X
j>i
Cov [Di, Dj|T ] = 2 ·X
b2I
✓bs
2
◆bl =
X
b2I
bs(bs � 1)bl,
�D|T =1
n
sX
i
Di +X
b2I
bs(bs � 1)bl.
T = T ± z0.025 · �T |T
T = t
2
4D
C± z0.025 ·
1
C
vuut✓
D
C�C
◆2
+1
n2
X
i
Di +X
b2I
bs(bs � 1)bl
!3
5 .
9
An alternative frequentist estimator defines D as half the average mutational distance dij between pairs of individuals that span the ancestral root (3):
Here, L and R represent sets of individuals on the left and right side of the root. This estimator is less well-suited to our data set. We have four Y hgA individuals on the left side of the tree and 65 individuals on the right side. This partition-based approach effectively upweights information from the hgA samples, since all distances are measured with respect to a member of this clade. However, we have lower effective coverage on the internal branches of hgA than elsewhere in the tree. This is due to both the lower number of samples and the fact that hgA lineages are highly diverged. Consequently, these are exactly the samples for which false negatives are of greatest potential impact. For the sake of comparison, the TMRCA point estimates from this approach are 134 ky and 118 ky for the Y chromosome and mtDNA, respectively. Estimating the Ratio of mtDNA TMRCA to Y TMRCA To compare the TMRCA of the Y chromosome to that of the mtDNA, we estimate the ratio:
where we define M and Y as the fixed but unknown unscaled TMRCA of the mtDNA and Y respectively, and R as the ratio M / Y. The quantity τ = tm / ty is the ratio of coalescence times of the Native American lineages, mtDNA hgA2 and Y chromosome hgQ. Our estimator of γ is:
where
The standard error is:
Since R is the ratio of two random variables, its standard error is:
D =1
2|L||R|X
i2L
X
j2R
dij.
� =Tm
Ty=
tmM
tyY= ⌧R,
� = ⌧ R = ⌧M
Y,
M = Dm/Cm,
Y = Dy/Cy,
R = M/Y .
��|� = ⌧�R|M,Y .
10
where
€
ρ = Corr[ ˆ M | M , ˆ Y |Y ]. We cannot disregard the correlation term in this case. If the TMRCA of male and female lineages are correlated, their estimates will be as well, though the correlation of the estimates would necessarily be less than that of the true values due to the uncertainty in both variables. Confidence bands for γ are defined by:
To assume zero correlation would be conservative, as positive correlation reduces the variance. We consider representative values of ρ for the sake of comparison (Fig. S13). Again, the bias of the point estimator is minimal (36).
Empirical Bayesian Estimation of TMRCA and Ne: GENETREE As distributed, GENETREE can handle only 99 sites per run, but we modified the source code to enable runs of several thousand SNPs. First, we perform a grid search to obtain a maximum likelihood estimate for the scaled mutation rate, θ = 2Neµlg, where µlg is the locus-wide per generation mutation rate. We then simulate the posterior distribution of TMRCA, conditional on this estimate. We restricted each analysis to a single population so that the assumption of exchangeability of lineages (38) would hold. As the TMRCA is determined by the deepest coalescence in a sample, we exclusively analyzed populations that sample from both sides of the tree (Fig. 2): the San and Baka for the Y chromosome and the Mbuti and Nzebi for the mitochondrial genome. Results from the Baka and Mbuti Pygmy populations are the most directly comparable (Table 1).! We excluded several lineages from the GENETREE analyses. In the Baka, we excluded three samples possessing high levels of autosomal identity by descent with another individual, as inferred with Illumina Omni SNP arrays. We also excluded six Baka hgE samples, as these likely represent West African agriculturalist lineages that have introgressed into the Baka a few thousand years ago (39) in violation of the exchangeability assumption of coalescent theory. In the mitochondrial analysis we removed two Nzebi and one Mbuti because GENETREE does not allow for identical lineages. Point estimates for the Baka Y chromosomes reflect averages of multiple coalescent runs. Each run subsampled 1500 (of 2927) segregating sites to overcome computation limitations for the full dataset. Estimates for the Mbuti mtDNAs reflect averages of multiple coalescent runs, each with a different random seed, as these runs were more variable due to a smaller Poisson mean (nµl).
�R|M,Y ⇡1
E[Y |Y ]
vuut
E[M |M ]
E[Y |Y ]�Y |Y
!2
+ �2M |M � 2⇢�M |M�Y |Y
E[M |M ]
E[Y |Y ],
� = ⌧
"M
Y± z0.025 · �R|M,Y
#.
11
Coalescent theory measures time in units of Ne generations. To convert to years, we use the maximum likelihood estimate of θ, the gender-specific generation time (g; Table S3), and the Native American calibration estimate for µly, the locus-wide per year mutation rate:
GENETREE is suboptimal for our data set. Due to the exchangeability assumption and computational limitations, each analysis draws information from just a subset of the data. Because the full sequence data is highly informative about the underlying gene genealogy, very few random trees are compatible with it. This makes GENETREE a highly inefficient approach to estimating population genetic parameters. Thus, we emphasize the point estimates and confidence intervals derived from the frequentist approach.
Predata Distribution of TMRCA For a constant population size, the TMRCA of a locus, measured in Ne generations, is given by:
where Ti is the time during which i ancestral lineages of the sample existed. Coalescent theory (38) models Ti as an exponential random variable with parameter:
To obtain the distributions presented in Fig. 3, we simulated five million draws of TMRCA for n = 100 lineages and scaled each value by a factor of Ne·g to convert to years.
Ne =✓
2µlg=
✓
2gµly
TMRCA = TcNeg =Tc✓
2µly
TMRCA =nX
i=2
Ti,
�i =
✓i
2
◆.
12
Supplementary Text
Novel Y Chromosome Phylogenetic Structure Haplogroup B2 Within hgB2, we identify one clade and three additional lineages that represent previously uncharacterized structure (Figs. 2, S7). Each lineage represents an ancient divergence within the Y chromosome phylogeny and carries no known differentiating mutations downstream of M192 and Page72, which define hgB2b1. First, in the main text we describe a subclade of B2b1a that encompasses six Baka individuals. Previously, B2b1a2 was associated with the P70 variant, but because these six Baka individuals carry the ancestral allele for P70, we propose reassociating P70 with a new label, “B2b1a2a,” and labeling the new clade “B2b1a2b.” Second, B2b1b was previously associated with P6, but we have identified a Mbuti individual carrying the ancestral allele for this variant. Thus, we propose associating P6 with a new label, “B2b1b1,” and designating the new lineage “B2b1b2.” Finally, we identify two new lineages within B2b1a1. The individuals representing both of these lineages carry the ancestral T allele for the M169 variant that defines B2b1a1a, the only extant sublineage of B2b1a1 not represented. Haplogroup F Table S2 presents genotyping results for the M578 variant in separate panel of individuals. The results confirm the (G, H, IJK) → (G, (H, IJK)) polytomy resolution. The demographic fates of hgG and hgHIJK were geographically asymmetric, with the spread zone of hgG (40) considerably more restricted than that of hgHIJK (Fig. S6). The latter now spans all continents, including Africa due to the back migration of some haplogroups (41).
Imputation We used our phylogeny-aware algorithm (Fig. S8) to impute approximately 5.3 missing genotypes per Y chromosome variant site and a median of 826 per individual. Imputation Limitations It is not possible to impute singletons: when the carrier of a unique allele has zero reads of support, there is no evidence for variation at the site. Doubletons pose a similar problem. Let A and B be nearest neighbors in the phylogeny. Consider the case where, at a given site, A possesses an allele not observed in any other sample, and B has zero reads. It is impossible to distinguish whether the site is an A singleton or an A/B doubleton. However, conditional on one sample missing data at a particular site, our imputation strategy correctly imputes two thirds of tripletons; it fails only in the case where the lineage of the missing sample is the last to coalesce. For four lineages, there are 18 possible trees. Of these, twelve consist of stepwise coalescence, and the lineage with
13
missing data is the most diverged in just three. Thus, we correctly impute five-sixths of quadrupletons. Polarizing Variants on the Branch Spanning the Ancestral Root Our method to infer the ancestral state at a given site was inapplicable to the 398 variants assigned to the most ancient (basal) split, as no outgroup for these branches was present within the data set. For these, we first conducted a LiftOver (42) to map GRCh37 coordinates to those of the chimpanzee reference (PanTro3). Due to the abundance of large-scale inversions between the two chromosomes (17), it was necessary to BLAT (43) 101 bp chunks of DNA surrounding each human variant to infer relative orientation. Ancestral states were thereby inferred for 322 variants, and those of the remaining 76, for which the corresponding chimpanzee allele could not be inferred, were randomly assigned in the corresponding proportion. Homoplasy and the Infinite Sites Model We deemed a SNV consistent with the tree when we observed no ancestral alleles in the subtree rooted at the branch to which the SNV was assigned. Most variants (11,279) were consistent with the tree, and we imputed missing genotypes for those that were. Sites incompatible with the phylogeny were uniformly distributed across the callable regions (Fig. 1) and were excluded from downstream analyses. Just 199 (of 361) incompatibilities were supported by more than one sequencing read. This lack of homoplasy on the Y chromosome justifies usage of the infinite sites model.
Calibration and Mutation Rate Estimation Mutation rate estimates are typically based on family pedigrees (14) or species phylogenies, such as the human-chimpanzee divergence (2, 3). However, just one pedigree-based rate is available for the Y chromosome, and, though the mutation process is highly stochastic, this rate is based on a single pedigree. Furthermore, precise alignment between the human Y chromosome and that of the chimpanzee is difficult due to extreme structural divergence. Finally, if the Y is subject to a time-dependent mutation rate, as is mtDNA (24, 25), then neither estimation approach is ideal for dating human population events. Instead, we estimate mutation rates using a within-human calibration point, the initial migration into and expansion throughout the Americas. Well-dated archaeological sites include Paisley Cave in Oregon, which dates to 14.3 kya (19); Buttermilk Creek in Central Texas, at 13.2–15.5 kya (44); and Monte Verde II in Southern Chile, 14.6 kya (45). To date the expansion of genetic lineages unique to the Americas, we follow Goebel et al. who state that the most parsimonious estimate is that “humans colonized the Americas around 15 kya” (19). We show that a lack of parity between the expansion event and the divergence of lineages used for calibration would have minimal effect on the difference between the TMRCA of the Y and mtDNA if the divergences are within a few thousand years of one another (Fig. S13, Materials and Methods).
14
For reference and comparison, Table S3 summarizes mutation rate point estimates on four scales. The Y chromosome mutation rates are similar to previous autosomal phylogenetic-based mutation rates and extended pedigree-based rates, but they are almost two-fold higher than autosomal mutation rates based on trios (46).
Impact of Sequencing Error and Sequence Coverage on TMRCA Estimation We developed a method to estimate the variance in estimated TMRCA that is due to the stochastic nature of the mutation process (Materials and Methods, “Frequentist Estimation of TMRCA” section). Here we discuss the potential impact of bias due to sequencing error and modest sequencing coverage. We have estimated TMRCA by calculating the ratio of two quantities, divergence and the mutation rate, each of which depends on experimental measurements. The numerator is the average tip-to-root height of the tree, and we estimate the denominator as the ratio of average branch length within the calibration subtree to the calibration time. Data for each of the three measurements is imperfect. In this section, we consider potential biases in the first two, and we consider calibration time in the next section. Tip-to-Root Height We measure tip-to-root height as the total number of SNVs assigned to all branches separating an individual from the common ancestor of all individuals. This sum includes the singletons of the terminal branch and the shared variants on the internal branches. Two factors act in opposition to stretch and shrink an observed branch length with respect to its true value: sequencing error and the total sequencing coverage of the branch, which itself is influenced both by sequencing coverage of individuals and by sampling density of the clade rooted at the branch. The primary effect of sequencing error is to stretch terminal branches, as it is unlikely that random sequencing errors will cluster phylogenetically. We have demonstrated that genotype error is minimal (Materials and Methods, “Validation” section). Consequently, branch lengths are not significantly inflated by sequencing error. Though modest sequencing coverage translates to unobserved variants near the tips of the tree, thereby shortening observed heights, the internal branches of the tree, which constitute the overwhelming majority of any tip-to-root path, have quite high coverage due to the superposition of sequencing from all descending lineages. Thus, most observed internal branch lengths cannot differ significantly from their true lengths. Fortunately, the most divergent sample with the longest terminal branch, the San individual in the hgA-M51 clade, had higher than average sequencing coverage (6.15×) and, consequently, call rate (0.985). We observed 1012 private variants in this individual, and we estimate approximately 22 false negatives—unobserved variants with either a no-call genotype or just one sequencing read, an event insufficient to identify a site as variable. This worst-case scenario is less than 2% of the average tip-to-root height. We likely have very few false negatives in other individuals, even among those of lower coverage, since the lower coverage samples are clustered in the densely sampled portions of the tree, such as in hgE and portions of hgB, and the imputation strategy we’ve implemented enables these lineages to receive credit for variation detected in neighbors and which they can be
15
inferred to possess. Finally, the maximum observed tip-to-root height (1188), could be considered a conservative upper bound on the true mean, and it differs from the observed mean by just 5%. Branch Lengths in the Calibration Subtree We now consider how sequencing coverage affects branch lengths in the Y chromosome hgQ subtree used to estimate the mutation rate. We sequenced Mayan HGDP00856, a representative of hgQ-M3, to 5.7× coverage and Mayan HGDP00877, whose haplogroup is labeled hgQ-L54*(xM3) because it carries the L54 mutation but is ancestral at the M3 SNP, to an average depth of 8.5×. Had we sequenced the two Mayan lineages to lower coverage, we would have artificially boosted TMRCA estimates by underestimating the mutation rate. However, haploid coverage for the Mayan samples are high enough that false negatives have little impact on our calibration. The rate of false negatives is dominated by sites in the terminal branches of the tree with either zero or one sequencing read for a sample. When an individual has zero or one read at a shared SNP, we can usually impute its genotype, but it is not possible to impute singletons or to distinguish a singleton from a doubleton in the presence of missing data (Supplementary Text, “Imputation” section). Although missing singletons and misclassified doubletons have little impact on total branch length from the tips to the root of the entire tree, they are quite important for calibration because singletons constitute a significant portion of branch length within the calibration subtree. In our study, the shared hgQ branch is of approximately the same length as the Q-M3 and Q-L54*(xM3) terminal branches. Consequently, no-call genotypes at singletons sites, which lead to missing singletons, are counterbalanced by no-call genotypes in the shared hgQ branch, which lead to doubletons misclassified as singletons. This relies on the fact that at 5.7× and 8.5× coverage, the no-call rates on the doubleton and singleton branches are comparable. In general, a no-call due to the presence of just a single sequencing read is less likely to occur on the doubleton branch than on the singleton branch, but of the 9,988,118 callable sites only 194,966 (2.0%) and 23,989 (0.2%) are covered by just one read in HGDP00856 and HGDP00877, respectively. To empirically estimate the false negative rate within the hgQ subtree used for calibration, we incorporated data from the 1000 Genomes Project (47). We downloaded genotype calls (VCF files) for 525 males from Phase 1, called haplogroups, and identified eleven individuals belonging to hgQ1. We then downloaded aligned sequence data (BAM files) for these samples, converted from the GRCh37 to hg19 reference, and applied our pipeline to the combined set of 80 individuals (Fig. S9). In the combined analysis, the branch shared by all hgQ lineages grew from 136 to 146 SNPs2. One SNP had not been called in either HGDP sample (hg19 position 15825218), and nine SNPs were no-calls in HGDP00856: three due to the absence of reads, and six due to one erroneous read (of 4– 1 A twelfth, NA19753, was sequenced using SOLiD. We did not include this sequence in our analysis since it is likely to have different error and mapping properties than those generated by Illumina technology. 2 The exact length is 149, but the difference includes two SNPs that were on the borderline of the depth-based filter in the main study and a net of one SNP discarded due to homoplasy: two in the main study and one in the combined analysis.
16
10). With perfect data, these nine SNPs would have been classified as doubletons, but they were instead misclassified as HGDP00877 singletons. Thus, for HGDP00856, we can estimate the no-call rate within the hgQ subtree, β0 ≈ 6.8% (10 / 146). Partly because the coverage is higher, we observed no doubletons misclassified as singletons due to missingness in HGDP008773. Thus, for HGDP00877, β0 ≈ 0.7% (1 / 146). Whereas on the shared doubleton branch the no-call rate should sufficiently inform the type 2 error rate (βd ≈ β0), the no-call rate does not provide complete information for the terminal branches since GATK, prudently, will most often not designate a site as variable if there is just one sequencing read with the alternative allele in the entire sample. Thus, to fully model the singleton type 2 error rate, βs, we must also consider the probability of observing just one read, β1, since when this occurs at a singleton site, a false negative will most often result. To do so, we computed the sequencing read depth distribution over all ten million callable sites for each sample. Scaling this empirical probability mass function by the number of singletons observed in the individual and censoring to discard the zero-read and one-read bins, we observe that when coverage exceeds 4×, the expected read-depth distribution among singletons closely mirrors the observed distribution (Fig. S10). This suggests that there are few false negatives at sites for which at least two sequencing reads are observed. Thus, βs ≈ β0 + β1. When a branch with false negative rate β has true length L and observed length Y, the number of unobserved variants, X, is given by:
. On the HGDP00856 singleton branch, we have Y = 126 and, from the empirical read-depth distribution, β1 = 2.0%. Thus, βs ≈ β0 + β1 = 6.8% + 2.0% = 8.8%, which gives X ≈ 12.2 missing singletons. This is likely an overestimate because the no-call rate across all variable sites, 2.2% (Table S1), is lower than the empirical rate within the subtree, 6.8%. The branch shared by all hgQ-M3 lineages (branch 18 in Fig. S9) affords an opportunity to empirically check the singleton false negative rate for HGDP00856, since this individual should possess each of these variants. We had correctly called 16 of 17 in our main analysis. This suggests a singleton false negative rate for this sample of 1/17 = 5.9%4, but the variance for this particular estimate is quite high since it is based on just 17 sites, so to be conservative, we use the value of 8.8% estimated above. For HGDP00877, we have Y = 120 and β1 = 0.2%, which give βs ≈ 0.7% + 0.2% = 0.9%, and X ≈ 1.1 missing singletons. This prediction cannot be tested empirically with these data because the lineage is an outgroup to the two hgQ-L54*(xM3) sequences from the 1000 Genomes Project. As discussed above, there were nine doubletons previously 3 It is possible that one such SNP exists and is missing in all three hgQ-L54*(xM3) sequences, but this is a low probability event. 4 The lone false negative occurred at hg19 position 22613361. Prior to imputation, we do make the correct call in the combined analysis, because one read was present, and it carried the derived A allele.
X = �L =�
1� �Y
17
classified as HGDP00877 singletons, so accounting for type 2 errors reduces this branch length by 7.9 (9 – 1.1). Putting these two together, we compute the average branch length since MRCA of the two samples as 125 SNPs, which differs by the observed value of 123 by 1.6%. Thus, one might wish to scale our Y chromosome TMRCA estimates by a factor of 123 / 125 = 0.984. However, the effect of false negatives would be offset by false positives, should one or two exist, so we choose not to. False negatives are not an issue for mitochondria, where all sequences are complete.
Calibration Time In light of the above, the largest potential source of bias is the calibration time: the dating of the arrival of humans into the Americas and the approximation of synchronicity of this arrival with phylogenetic divergences. Timing of Expansion into the Americas Archaeological dates for the time of first arrival in the Americas range from 14.3–16.5 ky. Goebel, et al. (19) conclude that the most parsimonious estimate is that “humans colonized the Americas around 15 kya,” so we elect 15 ky as reasonable figure for both the maternal and paternal loci. If the true divergence time of American lineages were 14.3 ky, one must scale down the TMRCA ranges we report by about 5%. Likewise, for 16.5 ky, an increase of 10% would be requisite. However, the specific number used will have no effect on the relative TMRCA estimates for the two loci, provided the divergences of the two loci were contemporaneous. We consider the case of unequal split times in Fig. S13 (Materials and Methods, “Estimating the Ratio of mtDNA TMRCA to Y TMRCA” subsection). Y Chromosome Calibration Point With 108 sampled lineages, the point of rapid expansion within the Americas among mtDNA hgA2 lineages is clear. However, the corresponding point within Y hgQ is less so. Though we have argued that M3 most likely occurred shortly subsequent to initial entry to the Americas, it remains possible that hgQ-M3 and hgQ-L54*(xM3) diverged within Siberia or Beringia. When we include lower coverage 1000 Genomes hgQ lineages, we observe a star-like diversification among the Q-M3 derived lineages (Figure S9, below branch #18). It is possible that some subset of the 17 M3-equivalent mutations accumulated prior to entry—within Beringia, for example, as has been proposed for mtDNA founding lineages (48). However, 12 of the 13 sequenced individuals are from Mexico, and this sampling bias could obscure a more upstream initiation of the expansion. For example, it is possible that hgQ-M3 lineages within Greenland do not share all 17 of these mutations. Because just three sequences represent hgQ-L54*(xM3), the phylogenetic structure of this subhaplogroup remains largely unknown, but the root of the sampled hgQ-M3 lineages can be used to calculate a strict lower bound on the mutation rate, as entry to the Americas certainly happened no later than this point.
18
The 1000 Genomes lineages are inappropriate to calibrate upon due to lower sequencing coverage (average = 2.9×; Supplementary Text, “Branch Lengths in the Calibration Subtree” subsection), so we are left with a single lineage from our sample, HGDP00856, for this lower bound calculation. Accounting for false negatives had little effect when two samples were used for calibration, as the degree to which the hgQ-M3 branch grew was offset by a corresponding shrinkage of the hgQ-L54*(xM3) due to the hgQ doubletons that were unobserved in HGDP00856 and thereby misclassified as HGDP00877 singletons. However, it is important to correct for type 2 errors when considering this lineage alone. In the main analysis, the observed length of the M3 lineage was 126 mutations. This breaks down to 16 observed M3-equivalent SNPs and 110 post-M3 SNPs. Using a singleton false negative rate of 8.8%, this translates to approximately 10.6 (0.088*110/(1–0.088)) unobserved post-M3 SNPs, which gives a calibration length of 120.6 SNPs. This differs from the calibration used in the main text by 1.9%.
Existence of Rare Yet More Basal Lineages We emphasize that the estimates we derive refer to the coalescence times within our sample. For the mitochondrial genome, we have likely sampled the most divergent branches in the tree (34). However for the Y chromosome, our estimate of the TMRCA reaches as far back as the A1b clade. Inclusion of samples from hgA1a or the newly discovered hgA0 (5) or hgA00 (49) would push the date further back. However, these haplogroups are very rare, and it is difficult to assess whether correspondingly divergent but singular mitochondrial genomes may also await discovery.
Effective Population Size The Ne differences we observe between males and females are most likely due to a greater variance in reproductive success among males, a phenomenon influenced by cultural and demographic factors, such as the practice of polygyny (50). Both purifying and positive selection could also act to reduce the Ne along the linked regions of the Y chromosome. However, both forms of selection may have also acted on the mitochondrial genome. Additional information would be necessary before one could invoke natural selection as the primary cause of reduced male Ne, and the hypothesis is neither necessary nor sufficient.
Additional Acknowledgements This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1147470. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
19
Fig. S1. Map of populations. We sampled Y chromosomes and mtDNAs from nine populations including Baka Pygmies from Gabon, Cambodians, Maya from Mexico’s Yucatán Peninsula, Mbuti Pygmies from the Democratic Republic of Congo, Mozabite Berbers from Algeria, Nzebi from Gabon, Pashtuns (Pathan) from Pakistan’s North-West Frontier Province, San from Namibia, and Yakut from Siberia.
●
●
●
●
●
●
●
●
●
BakaCambodianMayaMbutiMozabiteNzebiPashtunSanYakut
20
Fig. S2. Sequencing read mapping on Xq21. Total read depth and the depth of MQ0 reads are plotted for 24 HGDP females. Mean values in contiguous 5 kb windows are shown along chrXq21. Dashed gray lines indicate the region that corresponds to the “X-transposed” segment of the Y chromosome.
chrX Position (Mb)
Dep
th in
HG
DP
Fem
ales
●
●
●
●
●●●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●●
●
●
●
●
●●●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●
●●●
●
●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●●●
●
●●●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●●●
●
●●●
●
●●●●
●
●
●
●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●
●
●
●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●
●
●●●●●
●
●
●●●●●●●●●
●
●
●●●●●
●
●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●●●●
●
●●●●
●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●
●●●●●●●●●●
●●
●
●
●●●●
●
●●●●●●
●●●●●
●
●●
●
●●
●
●●●
●●
●●
●
●
●
●●●●
●●
●
●
●
●●
●●
●●●
●
●
●●●●●●●
●●
●●
●
●
●●●●●●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●●●
●
●●●●●
●
●
●
●
●
●●
●●
●●
●●●
●●●
●●●●
●●●●
●
●●
●
●
●
●
●
●●●●●
●●
●
●
●
●●●●●●●●●●●●●
●●●●●
●●●●●●
●
●
●●
●
●
●●●
●
●●●
●●
●●
●
●●
●
●●
●●●●●●●●
●●●
●
●
●
●
●
●●
●●●
●
●●
●
●●●●●●
●
●
●●●●
●
●●●●●
●
●●●●
●●
●
●
●●●●●
●●
●
●●
●●
●●
●
●
●●●●●●●●●●●
●
●
●●●●●●
●
●●
●
●●
●
●●
●
●
●●●●
●●●●●●●
●
●●
●
●●
●●●●
●●
●●●●
●
●●
●●
●
●
●
●
●●
●
●●●●●
●●
●●
●
●
●●●●●●●●
●●●
●●
●
●●●●●●●●●●●●●●●●●●●
●●
●
●
●
●●●●
●●●●
●
●
●
●
●
●
●●
●
●●●●●●●●●●●●●●●
●●
●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●●●●
●●●
●
●
●
●
●
●
●●
●
●●●●●
●●
●●
●●●●●
●
●●
●
●●●
●
●
●
●
●●●●●
●
●●
●
●●●
●
●
●
●●●
●●
●
●●●●●●●●●●
●
●
●
●
●●●
●
●
●●
●
●●●
●
●
●●
●●
●
●
●
●●●●●●
●
●●●●●
●
●
●
●●●●●●
●●●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●
●
●
●●●●●●●●●●●
●
●
●
●●●
●●
●
●
●●
●●●
●
●
●
●
●●
●●●
●
●●
●
●
●●●●
●
●
●●
●●●●●●●●●●●●
●
●
●
●
●●●●●
●●●●●●
●●●●●●●●
●●
●●
●
●
●
●●
●●●●●●●
●
●
●●●●
●
●
●
●
●
●●●●
●●●●
●●●●●●
●●
●●
●
●●
●
●●
●●
●●●●●
●●
●
●
●●●
●
●
●
●●●
●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
●
●
●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●
●
●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●
●
●
●●
●
●●●●●●●●●●●●●●●●●●
●●
●
●
●●●●●●●●●●●
●●●●●
●
●●●●●
●
●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●
●
●●
●
●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
85 86 87 88 89 90 91 92 93 94 95 96
050
100
150
200
250
300 Homologue of X−transposed Region●
●
Filtered DepthMQ0 Depth
21
Fig. S3. Quality control and genotype calling on the Y chromosome. (A) Pipeline from Illumina sequencing to filtered variable sites. (B) Missingness distribution subsequent to imposition of regional mask (Fig. 1) and mapping quality filter. (C) Filtered depth versus physical position. Sites above or below 3 MADs of the median were filtered out. (D) Depth distributions with tranches defined by the number of samples with a heterozygous maximum likelihood genotype. Evidence for multiple alleles in 0–7 samples is likely due to random sequencing error, but sites with more than seven “het” samples exhibit inflated depth. We infer this to result from mismapping and filter these sites out. Filters in (B–D) were tuned on variable sites only. Numbers in (A) are chromosome-wide, without regard to variability.
Missingness Distribution
Samples with Missing Data
Site
s
0 5 10 15 20 25 30 35
050
010
0015
00
100 200 300 400 500 600
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Depth Stratified by Maximum Likelihood Het Counts
Depth
Den
sity
of S
ites
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●● ●
●
●
●
●
●
●
●
●
●
● ●
●
● ● ● ● ●
●
●● ●
●
●
●
● ●
●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ●
●
●
● ●
●
●
●
● ● ● ● ● ● ● ● ● ●● ● ● ●
● ●
●
●
●
●
●
●
●
●
● ● ● ● ● ●
●
0−2 (n=11231)3−5 (n=243)6−7 (n=138)8−10 (n=89)11−12 (n=36)13−34 (n=32)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
010
020
030
040
050
060
0
Depth vs. Physical Position
Position (Mb)
Dep
th
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
B D
C
Haploid-model Expectation-Maximization algorithm
11,640 Variant Sites
Start: Regional Mask: Mapping Quality ( MQ0/DP ≥ 0.10 ): No-calls ( > 20 ): Depth ( > 371 or < 159 ): MaxLik Het Count ( > 7 ): Callable:
Mapping to GRCh37: BWA PCR Duplicate Removal: Picard Quality Score Recalibration, Local Realignment, GATk Genotype Likelihood Computation:
Map
ping
}
Cal
ling
Sequencing: Illumina HiSeq 2000
Filtr
atio
n
22,974,737 – 12,532,580 – 129,411 – 170,144 – 153,937 - 547 9,988,118
A
22
Fig. S4. Cross-tabulation of populations and Y haplogroups. We sampled predominantly from Africa and observe 23 autochthonous lineages along with 32 representatives of the Bantu hgE. In addition, we sample 14 Eurasian and Native American lineages.
A B E G H L N O Q R 69 4 19 32 2 1 1 5 2 2 1
San 6 3 3 Baka 20 1 13 6
Mbuti 5 2 3 Nzebi 20 1 19
Mozabite 4 4 Pashtun 4 2 1 1
Cambodian 4 1 1 2 Yakut 4 4 Maya 2 2
African( Non+African(Autochthonous( Bantu(
23
Fig. S5. Call rate and mean sequencing coverage on the Y chromosome. (A) Distribution across samples of call rate, the percentage of variable sites for which a genotype was called prior to imputation. Samples are stratified by collection: HGDP (29 samples from 7 populations) and Gabon (40 samples from two populations). (B) Distribution of mean sequencing coverage among variable sites.
Call Rate0
1020
3040
5060
7080
90100 HGDP Gabon
Samples
Mean Coverage
02
46
810
12
HGDP Gabon
Samples
B A
24
Fig. S6. Y chromosome phylogenetic backbone. Defining mutations for and geographic distribution of Y chromosome haplogroups (41) are indicated along with sample size (dark blue). Gray lineages were not sampled in this study. Light blue branches indicate the new structures introduced by our resolution of a polytomy within macro-haplogroup F.
4 19 32 2 1 1 5 2 2 1
A1b B E D C F* G HIJK* H I J T L Q RK* M S N O P*
K-M9 IJ-M429
IJK-M522
CF-P143
CT-M168
DE-YAP
D-M174
E-M40 C-M130 F-M89
K(xLT)-M526
BT-M42
B-M60
S-M230
T-M70 M-Page93 L-M20
NO-M214 P-M45
H-M69
G-M201
Africa Himalayas, East Asia
LT-P329
HIJK-M578
N-M231
O-M175 R-M207
Q-M242
Mediterranean, Middle East, Somalia
Caucasus, Europe
East Asia
Siberia, Americas
South Asia, Roma
Eurasia Turkey, Iran, South Asia
New Guinea, Oceania Boreal Asia
Middle East, West/South Asia
A1b-V221
Europe
25
Fig. S7. Novel structure in Y hgB2. Detail of the maximum likelihood phylogeny shown in Fig. 2. The African-specific haplogroup B2b is present at high frequency among hunter-gatherer populations in central and southern Africa, such as the Baka and San. Orange branches indicate novel phylogenetic structure not described in ISOGG. Italics indicates an extant branch label that we have proposed moving upstream, and orange text indicates a proposed new haplogroup label. Several of the newly described lineages have substantial branch lengths; B2b1a2b dates to approximately 35 kya.
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 1100.0 1200.0
H-M138 Cambodian
N-M231 Cambodian
E-P59 Nzebi
Q-M3 Maya
E-P116 NzebiE-M191 Nzebi
E-P252 Nzebi
B-P70 San
E-U290 Nzebi
B-M192 Baka
N-L708 Yakut
E-M183 Mozabite
N-L708 Yakut
E-U290 Baka
E-P116 Nzebi
N-L708 Yakut
L-M357 Pashtun
R-L657 Pashtun
E-M154 Nzebi
A-P28 San
Q-L54 Maya
B-M192 Baka
A-M14 Baka
B-M30 Baka
E-P277 Nzebi
E-M183 Mozabite
B-M192 Baka
O-Page23 Cambodian
E-P278.1 Nzebi
E-P252 Baka
E-P277 Nzebi
E-U290 Nzebi
E-P278.1 NzebiE-P277 Nzebi
B-M211 Baka
A-M51 San
E-P252 Baka
E-M191 Nzebi
E-P252 Mbuti
G-M406 Pashtun
E-L515 Baka
N-L708 Yakut
E-P252 Baka
E-M183 Mozabite
B-M112 Baka
B-P6 San
B-M211 Baka
E-P277 Nzebi
B-M192 Baka
A-P262 San
G-M377 Pashtun
E-P277 Nzebi
B-M109 Nzebi
E-P277 Mbuti
E-M183 Mozabite
B-M112 Baka
B-Page18 Mbuti
B-M192 Baka
E-P277 Nzebi
B-P6 San
E-P252 Mbuti
B-M192 Mbuti
E-P252 Nzebi
B-M30 Baka
B-M192 Baka
E-P277 Nzebi
E-P252 Baka
O-M95 Cambodian
B-M112 Baka
CT-M168
N-Page56
P-M45
O-P186
E-U290
A-M6
G-P287
E-M2/M180
Q-L54
E-M191E-L514
BT-M42E-P179
KxLT-M526
E-U175/P277
N-L708
HIJK-M578
A-M14
F-M89
E-M183
E-P252
A-L419
K-M9
NO-M214
B2 (M182)
B2a (M150)
B2b1 (M192)
B2a1a (M109)
B2b1b
B2b1a B2b1a1
B2b1a2 B2b1a2b
B2b1a2a B2b1a1c
B2b1a1b
B2b1b2 B2b1b1
26
Fig. S8. Phylogeny-aware imputation. (A) Schematic of a phylogenetic tree. Asterisks indicate samples for which no genotype was called: { 1, 6, 14 }. By parsimony, the algorithm infers the T→G variant to have arisen on the branch incident upon node 11 and imputes missing data accordingly (white text on colored background). (B) Jalview (51) visualization of Y chromosome sequence in phylogenetic context. Assigning SNVs to branches enables hierarchical clustering of both variants (rows) and samples (columns). Phylogenetic branching patterns are clearly defined by specific sets of mutations.
Obs:Imp: G T T T
G T * TT T T G G G
9 13 14 15T * T G * G0 1 3 5 6 8
1817
12
411
2 7 10 16
A B
27
Fig. S9. Y chromosome hgQ clade with Phase 1 1000 Genomes samples included. Y hgQ phylogeny derived by merging our data with 11 lower coverage sequences from Phase 1 of the 1000 Genomes Project. Haplogroups Q-L54*(xM3) and Q-M3 are indicated by different shades of blue. Each branch is labeled by an index and the number of SNPs assigned to the branch in brackets. Individuals are labeled by population, ID, and haplogroup. The two samples used for calibration in the main analyses are circled. Branch 18 indicates SNPs inferred to be shared by all of hgQ-M3, and branch 24 is shared by all of hgQ.
0.050.0
100.0150.0
200.0250.0
300.0
11. [59] MXL N
A19729 Q-M
3
25. [263] Pashtun HG
DP00243 R
-L657
7. [89] MXL N
A19735 Q-M
3
15. [54] MXL N
A19774 Q-L54
9. [93] MXL N
A19783 Q-M
3
19. [91] Maya H
GD
P00877 Q-L54
3. [35] MXL N
A19732 Q-M
3
5. [112] Maya H
GD
P00856 Q-M
3
0. [45] MXL N
A19682 Q-M
3
21. [61] MXL N
A19795 Q-L54
14. [89] MXL N
A19664 Q-M
3
1. [31] MXL N
A19786 Q-M
3
20. [36] MXL N
A19771 Q-L54
13. [97] CLM
HG
01124 Q-M
3
2.[9]
24.Q-L54.[149]
23.Q-L54*(xM
3).[21]
10.[3]
18.Q-M
3.[17]
6.[1]
22.[62]
4.[26]
28
Fig. S10. Sequencing coverage for Mayan HGDP00856 at singleton sites. Sequencing depth distribution across all callable sites, scaled by the number of observed singletons (black). The observed depth distribution among singletons (red) indicates that sites with zero or one read are not identified as variable, however the observed distribution is in line with the censored expectation—the expectation conditional on the presence of two or more reads (blue).
Sequencing Depth
Num
ber o
f Sin
glet
ons
0 2 4 6 8 10 12 14 16 18 20
02
46
810
1214
1618
2022
ExpectedObservedCensored Expectation
29
Fig. S11. mtDNA phylogeny. We constructed the mtDNA tree from our samples in order to directly compare the TMRCA of this locus to that of the Y chromosome. Branch lengths are the number of derived SNVs. Internal branches are labeled by the haplogroups they define, and individuals are labeled by haplogroup and the population from which the individual was drawn. This mtDNA tree is concordant with previous constructions of the phylogeny from whole mitochondrial genomes (8).
0.0
10
.02
0.0
30
.04
0.0
50
.06
0.0
M51a1 C
ambodian
L1c1a1a1b1 Baka
U7 P
ashtun
U6a1a1 M
ozabite
L0d1b2 San
HV
1 Mozabite
L3e3b1 Nzebi
L3e3b2 Nzebi
L2a2a1 Mbuti
Z1a Yakut
L1c1a2b Baka
B4b1a3a Y
akut
G1b Y
akut
L1c1a1a1a Baka
L1c1a1a1a Baka
L2a4 Mbuti
M72 C
ambodian
HV
1 Mozabite
A2w
Maya
W3a1b P
ashtun
L1c1a2a1 Baka
A2 M
aya
T1a1 Pashtun
A2 M
ayaC5a1 Y
akut
L1c1a1a1b Baka
W3a1 P
ashtun
L0d1b2 San
M24 C
ambodian
L1c1a2b Baka
L1c1a2a1 Baka
L0a2b Mbuti
L3e1a3a Nzebi
L1c1a1a1a Baka
J2a2b Yakut
L3e2b1 Nzebi
D5a2a2 Y
akut
L1c1a2b Baka
B5a1a C
ambodian
L5a1c Mbuti
L0d1c San
L0d1b1 San
V M
ozabite
L1c2b2 Nzebi
L3e2b1 Mozabite
L1b1a15 Nzebi
A2w
Maya
L2b1b Nzebi
A2w
MayaL1c1a2b B
aka
L0a2b Mbuti
L1c1a2a1 Baka
L1c3a1b Nzebi
D4i2 Y
akut
L3e3b2 Nzebi
L1c1b Nzebi
L0k1a San U
7 Pashtun
D5b1d Y
akut
L2a2b1 MbutiL1c1a1a1a B
aka
L1c1b Nzebi
L1c1a2b Nzebi
J2b1a PashtunL1c1a1a1b1 B
aka
L1c1a2a1 Baka
F1a1a1 Cam
bodian
V M
ozabite
M3a1 P
ashtun
L3e2b1 Nzebi
L2a1c Nzebi
L1c1a1a1b1 Baka
HV
1 Mozabite
L1c1a2b Nzebi
L0a2b1 Mbuti
U3a M
ozabite
R9b2 C
ambodian
L3f1b4a Nzebi
L1c1a1a1a Baka
L0d1c1a San
L1c1a2b Baka
R23 C
ambodian
L2a1c5 Nzebi
M17c C
ambodian
A2 M
aya
K2a5 P
ashtun
L3d1a1a Nzebi
L0a1b2 Nzebi L3d3a N
zebi
L1c1a2a1 Baka
B2 M
aya
A2 M
aya
L1c1a1a1a Baka
L1
V
L0
HV
CZ
L2 L3
W
N
J
R
D
M
U
A
JT
30
Fig. S12. mtDNA calibration tree. Phylogeny constructed from 108 publicly available mtDNA hgA2 sequences (23). We used this clade to calibrate the mtDNA mutation rate based on divergence within the Americas.
0.01.0
2.03.0
4.05.0
6.07.0
8.09.0
10.011.0
12.0
A2+64 Dogrib
A2+64 Mexican
A2af1b2 Mexican
A2+64 Mexican
A2v Mexican
A2d Mexican
A2ad Unknown
A2aa Waiwai
A2aa PoturujaraA2ab G
uarani
A2+64 AcheA2+64 M
exican
A2+64 Waiapi
A2j Mexican A2u M
exican
A2v Mexican
A2o Mexican
A2 KatuenaA2h Yanomam
a
A2m M
exican
A2x Mexican
A2r Mexican
A2u Mexican
A2w Mexican
A2t Mexican
A2 Mexican
A2t Mexican
A2p1 Mexican
A2d MexicanA2 M
exican
A2u1 Mexican
A2c Mexican
A2+64 Mexican
A2f2 Mexican
A2af1b2 Mexican
A2 Mexican
A2u1 Mexican
A2+64 Mexican
A2+64 Mexican
A2d Mexican
A2g1 Mexican
A2 MexicanA2ae M
exican
A2h Yanomam
a
A2+64 Mexican
A2v Mexican
A2s Mexican
A2h1 Mexican
A2d Mexican
A2 Mexican
A2g Mexican
A2g Mexican
A2k1 WayuuA2f2 M
exicanA2ac Cayapa
A2l Mexican
A2 Surui
A2p Mexican
A2r Mexican
A2+64 Mexican
A2+64 Mexican
A2+64 Mexican
A2+64 Cayapa
A2+64 Mexican
A2u1 Mexican
A2+64 Waiwai
A2i Unknown
A2p Mexican
A2v Mexican
A2m M
exican
A2u1 Mexican
A2f3 Mexican
A2g Mexican
A2l Mexican
A2t Mexican
A2 Mexican
A2+64 Mexican
A2 Mexican
A2p1 Mexican
A2 Mexican
A2+64 Mexican
A2 Mexican
A2u1 Mexican
A2d Mexican A2g M
exican
A2j Mexican
A2g1 Mexican
A2p1 Mexican
A2f3 Mexican
A2+64 Mexican
A2n Unknown
A2 UnknownA2a Apache
A2l Mexican
A2h Kogui
A2p1 Mexican
A2+64 Mexican
A2+64 Mexican
A2l Mexican
A2x Mexican A2t M
exican
A2ab ZoroA2v Unknown
A2p Mexican A2 M
exican
A2+64 Mexican
A2w Arsario
A2o Mexican
31
Fig. S13. Comparing the Y chromosome TMRCA to that of mtDNA. (A) 95% confidence intervals for each locus (blue boxes) with point estimates (horizontal lines). (B) 95% confidence bands for γ, the ratio of mtDNA TMRCA to that of the Y chromosome, as a function of τ, the ratio of coalescence times for two Y hgQ lineages and 108 hgA2 mtDNAs. Point estimates are plotted as a solid line, and the estimate corresponding to concordant divergence times is indicated with a solid black point. Shading indicates the narrowing of the confidence bands as a function of potential positive correlation between estimates of TMRCA for the two loci.
TMRCA Confidence Intervals
T MR
CA
(ky)
025
5075
100
125
150
175
200
chrY mtDNA
●
TMRCA Ratio
Calibration RatioT M
RC
A(m
tDN
A)T M
RC
A(Y)
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
●
●
Point EstimatePoint Estimate (Calibration Ratio = 1)95% Confidence Bands (Correlation = 0)Correlation = 0.25Correlation = 0.50Correlation = 0.75
B A
32
Table S1. Y chromosome summary of samples. Identifier, population, most derived mutation observed, ISOGG haplogroup, percentage of sites at which a genotype call was made, mean sequencing coverage. ID Population Mutation Haplogroup Call Rate Coverage HGDP01029 San A-P28 A1b1a1a1 98.1 6.44 HGDP00987 San A-P262 A1b1a1a2b 93.5 3.73 0919 Baka A-M14 A1b1a1 91.2 2.86 HGDP01036 San A-M51 A1b1b2a 98.5 6.15 0920 Baka B-M30 B2b1a1b 92.1 2.99 0909 Baka B-M112 B2b 82.8 2.05 0918 Baka B-M30 B2b1a1b 91.0 2.73 0912 Baka B-M112 B2b 76.9 1.82 0927 Baka B-M192 B2b1 86.3 2.36 0937 Baka B-M211 B2b1a1c 94.1 3.12 0908 Baka B-M211 B2b1a1c 88.2 2.46 HGDP00992 San B-P70 B2b1a2 97.7 6.25 0917 Baka B-M192 B2b1 94.4 3.48 0922 Baka B-M192 B2b1 91.9 2.84 0904 Baka B-M192 B2b1 98.4 5.13 0932 Baka B-M192 B2b1 91.7 3.12 0925 Baka B-M192 B2b1 90.8 2.77 0907 Baka B-M112 B2b 92.7 2.90 HGDP00449 Mbuti B-M192 B2b1 93.2 3.35 HGDP01032 San B-P6 B2b1b 98.6 6.55 HGDP00991 San B-P6 B2b1b 92.8 3.70 HGDP00462 Mbuti B-Page18 B2a 94.1 3.88 0702 Nzebi B-M109 B2a1a 90.3 2.62 HGDP01259 Mozabite E-M183 E1b1b1b1a2 94.5 3.88 HGDP01258 Mozabite E-M183 E1b1b1b1a2 93.4 3.81 HGDP01262 Mozabite E-M183 E1b1b1b1a2 96.6 3.78 HGDP01264 Mozabite E-M183 E1b1b1b1a2 90.4 2.58 HGDP00456 Mbuti E-CTS8030 E1b1a1a1f1a1d 93.6 3.51 7030 Nzebi E-CTS8030 E1b1a1a1f1a1d 94.2 3.18 0938 Baka E-CTS8030 E1b1a1a1f1a1d 94.6 3.26 0906 Baka E-CTS8030 E1b1a1a1f1a1d 98.8 6.33 0712 Nzebi E-P252 E1b1a1a1f1a1 91.9 2.86 0914 Baka E-CTS8030 E1b1a1a1f1a1d 93.1 3.10 0913 Baka E-CTS8030 E1b1a1a1f1a1d 99.0 6.03 7005 Nzebi E-CTS8030 E1b1a1a1f1a1d 79.2 1.87 7003 Nzebi E-CTS8030 E1b1a1a1f1a1d 69.7 1.53 0713 Nzebi E-CTS8030 E1b1a1a1f1a1d 90.2 2.58 HGDP01081 Mbuti E-P252 E1b1a1a1f1a1 98.5 5.63 0711 Nzebi E-M191 E1b1a1a1f1a 95.6 3.55 0928 Baka E-L515 E1b1a1a1f1b 89.8 2.73 0716 Nzebi E-P59 E1b1a1a1g1b 90.0 2.58 0708 Nzebi E-P277 E1b1a1a1g1 99.3 12.93 7032 Nzebi E-P277 E1b1a1a1g1 93.2 2.99
33
HGDP00474 Mbuti E-P277 E1b1a1a1g1 90.7 2.62 0710 Nzebi E-P277 E1b1a1a1g1 81.1 2.02 0705 Nzebi E-P278.1 E1b1a1a1g1 85.4 2.46 0926 Baka E-U290 E1b1a1a1g1a 81.5 2.10 0703 Nzebi E-U290 E1b1a1a1g1a 83.9 2.22 0715 Nzebi E-Z1725 E1b1a1a1g1a2 88.4 2.46 0701 Nzebi E-M154 E1b1a1a1g1c 92.4 3.04 7019 Nzebi E-P277 E1b1a1a1g1 83.9 2.23 0707 Nzebi E-P277 E1b1a1a1g1 87.2 2.49 0709 Nzebi E-P277 E1b1a1a1g1 92.1 2.92 7020 Nzebi E-P277 E1b1a1a1g1 95.2 3.30 0714 Nzebi E-P278.1 E1b1a1a1g1 92.3 2.89 HGDP00222 Pashtun G-M406 G2a1c1 99.3 12.44 HGDP00213 Pashtun G-M377 G2b 91.9 2.87 HGDP00720 Cambodian H-M138 H1a3 98.7 5.67 HGDP00258 Pashtun L-M357 L1c 95.7 3.97 HGDP00964 Yakut N-L708 N1c1a1 92.0 2.94 HGDP00960 Yakut N-L708 N1c1a1 92.9 3.08 HGDP00948 Yakut N-L708 N1c1a1 94.7 3.51 HGDP00950 Yakut N-L708 N1c1a1 94.1 3.72 HGDP00715 Cambodian N-M231 N 96.8 3.93 HGDP00716 Cambodian O-Page23 O3a2c1a 92.6 3.10 HGDP00711 Cambodian O-M95 O2a1 93.2 3.50 HGDP00877 Maya Q-L54 Q1a3a 99.0 8.45 HGDP00856 Maya Q-M3 Q1a3a1 97.8 5.73 HGDP00243 Pashtun R-L657 R1a1a1b2a1a 94.5 3.57
34
Table S2. M578 genotyping results. All samples from haplogroups A, B, C, D, E, and G possess the ancestral C allele, and all samples from haplogroups H, I, J, K, L, M, N, O, Q, R, S, and T possess the derived T allele. This validates the polytomy resolution (G, H, IJK) → (G, (H, IJK)). One individual from paragroup F-M89* (HGDP00528) also possesses the derived allele and should therefore be re-classified as hgHIJK* in light of the newly defined topology. ID Haplogroup Population Genotype HGDP01406 A-M13 Bantu – turk209 A-M13 Turk C turk6256 A-M13 Turk C HGDP00988 A-M6 San C HGDP00931 B1-M236 Yoruba C HGDP00992 B2-M112 San C Bsk111 cell C Burusho C HGDP00029 C-M356 Brahui C HGDP00545 C-M38 Papuan C HGDP01310 C* Dai C HGDP00758 C1-M8 Japanese C HGDP00104 C3-M217 Hazara C HGDP01214 D-M15 Daur C HGDP01183 D-M15 Yizu C HGDP00752 D-M55 Japanese C HGDP00757 D-M55 Japanese C HGDP01226 D-P47 Mongol C HGDP00944 E-M191* Yoruba C HGDP00757 F2-M427 Lahu C HGDP01318 F2-M427 Lahu C HGDP00528 F3-M282 French T HGDP01152 G-L497 Italian C HGDP01050 G-L497 Pima C HGDP00213 G-M377 Pashtun C HGDP00222 G-M406 Pashtun C HGDP00725 G-M485* Palestinian C HGDP01073 G-M527 Sardinian C HGDP00017 G-P15* Brahui C HGDP00893 G-P16 Russian – HGDP00626 G-P19 Bedouin C HGDP00723 G-P303 Palestinian C HGDP00049 G1-M285 Brahui C HGDP00359 H-M197 Burusho T HGDP00720 H-M39 Cambodian T HGDP00041 H-M52 Brahui T HGDP00062 H-M52 Balochi T HGDP00254 H-M69*(xM52) Pashtun T HGDP00428 H-M82 Burusho T HGDP00438 H-M82 Burusho T HGDP00319 H-M82 Kalash T
35
HGDP00321 H-M82 Kalash T HGDP00326 H-M82 Kalash T HGDP00328 H-M82 Kalash T HGDP00224 H-M82 Pashtun T HGDP01066 I-M26 Sardinian T HGDP00627 J1-Page8 Bedouin T HGDP00555 K*-M9* Papuan T HGDP00084 L-M20 Balochi T HGDP00789 M-Page93 Melanesian T HGDP01295 N-M231 Han T HGDP00821 O-M117 Han T HGDP01060 Q-M346 Pima T HGDP00033 R-M17 Brauhi T HGDP00543 S-M230 Papuan T Greek ne7 T-M184 Greek T
36
Table S3. Mutation rate point estimates. We estimate Y chromosome and mtDNA mutation rates using entry to the Americas as a calibration event. Point estimates used in the analysis appear in bold. These are based on two divergent Y hgQ samples from within this study and 108 publicly available mtDNA hgA2 sequences (23). As the timing of the Out of Africa event is not known to great precision, the corresponding mutation rate estimates are included for comparison only.
Event Source n T M P µly ×103 µlg ×102 µby ×109 µbg × 108
Y Chromosome
Entry to Americas 2 15 123.0 122 8.2 26 0.82 2.6 Out of Africa Internal 14 50 393.5 127 7.9 25 0.79 2.5
Mitochondrial Genome (Full)
Internal 7 15 5.71 2620 0.38 1.0 23 61 Entry to Americas External 108 15 5.66 2650 0.38 1.0 23 60 Out of Africa Internal 39 50 16.51 3030 0.33 0.88 20 53
Mitochondrial Genome (Hypervariable Region Omitted)
Internal 7 15 3.57 4200 0.24 0.63 15 40 Entry to Americas External 108 15 3.94 3810 0.26 0.70 17 44 Out of Africa Internal 39 50 10.15 4920 0.20 0.54 13 34
Data n: Number of lineages T: Estimated time since divergence (ky) M: Average number of mutations since divergence Mutation Rate Measures P: Mutation period (years / mutation): T / M µly: Per year mutation rate for the locus: M / T µlg: Per generation mutation rate for the locus: µly × g µby: Per year mutation rate per bp: µly / L µbg: Per generation mutation rate per bp: µly × (g / L)
Parameters g: Average generation time
chrY: 31.5 years / generation chrM: 26.5 years / generation
L: Locus length: chrY: 9.988 × 106 bp chrM Full: 16,571 bp chrM HVR omitted: 15,755 bp
37
Data File S1. Sample, phylogeny, and variant data. This zipped archive (available at www.sciencemag.org) includes a more detailed version of the phylogeny presented in Fig. 2; a BED file detailing the regions within which genotype calls were made; population and haplogroup data for sampled individuals; data for each branch of the phylogeny, including length (# of SNVs) and the set of individuals within the subtree rooted at the branch; data for each variant, including phylogenetic placement, hg19 coordinate, ancestral and derived alleles, name, and ss#; and mtDNA genotype calls.
Data File S2. Y chromosome genotype calls. To protect participant privacy, this zipped archive is available through a data access agreement (DAA) for transfer of genetic data by contacting C.D.B.
Data File S3. Y chromosome mapped sequencing reads. This BAM file is also available via the DAA described above. Mapping, quality score recalibration, and indel realignment are described in Materials and Methods.
38
FTP Addresses and Accession Numbers for External Data
Y Chromosome hgQ Sequences from the 1000 Genomes Project Server: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/ Binary Sequence Alignment/Map Files: HG01124/alignment/HG01124.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam NA19664/alignment/NA19664.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19682/alignment/NA19682.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19729/alignment/NA19729.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19732/alignment/NA19732.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19735/alignment/NA19735.mapped.ILLUMINA.bwa.MXL.low_coverage.20130415.bam NA19771/alignment/NA19771.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19774/alignment/NA19774.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19783/alignment/NA19783.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19786/alignment/NA19786.mapped.ILLUMINA.bwa.MXL.low_coverage.20120522.bam NA19795/alignment/NA19795.mapped.ILLUMINA.bwa.MXL.low_coverage.20130415.bam
Complete mtDNA hgA2 Sequences: GenBank Accession Numbers AY195786.2 EF079873.1 EU095526.1 EU095528.1 EU095529.1 EU095530.1 EU095538.1 EU095552.1 EU095194.1 EU095195.1 EU095196.1 EU095197.1 EU095198.1 EU095199.1 EU095200.1 EU095201.1 EU095202.1 EU095204.1
EU095205.1 EU431081.1 EU431082.1 EU095545.2 EU431080.2 HQ012049.1 HQ012050.1 HQ012051.1 HQ012052.1 HQ012053.1 HQ012054.1 HQ012055.1 HQ012056.1 HQ012057.1 HQ012058.1 HQ012059.1 HQ012060.1 HQ012061.1
HQ012062.1 HQ012063.1 HQ012064.1 HQ012065.1 HQ012066.1 HQ012067.1 HQ012068.1 HQ012069.1 HQ012070.1 HQ012071.1 HQ012072.1 HQ012073.1 HQ012074.1 HQ012075.1 HQ012076.1 HQ012077.1 HQ012078.1 HQ012079.1
HQ012080.1 HQ012081.1 HQ012082.1 HQ012083.1 HQ012084.1 HQ012085.1 HQ012086.1 HQ012087.1 HQ012088.1 HQ012089.1 HQ012090.1 HQ012091.1 HQ012092.1 HQ012093.1 HQ012094.1 HQ012095.1 HQ012096.1 HQ012097.1
HQ012098.1 HQ012099.1 HQ012100.1 HQ012101.1 HQ012102.1 HQ012103.1 HQ012104.1 HQ012105.1 HQ012106.1 HQ012107.1 HQ012108.1 HQ012109.1 HQ012110.1 HQ012111.1 HQ012112.1 HQ012113.1 HQ012114.1 HQ012115.1
HQ012116.1 HQ012117.1 HQ012118.1 HQ012119.1 HQ012120.1 HQ012121.1 HQ012122.1 HQ012123.1 HQ012124.1 HQ012125.1 HQ012126.1 HQ012127.1 HQ012128.1 HQ012129.1 HQ012130.1 HQ012131.1 HQ012132.1 HQ012133.1
39
References 1. J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, M. W. Feldman, Population growth of
human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999). doi:10.1093/oxfordjournals.molbev.a026091 Medline
2. R. Thomson, J. K. Pritchard, P. Shen, P. J. Oefner, M. W. Feldman, Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data. Proc. Natl. Acad. Sci. U.S.A. 97, 7360–7365 (2000). doi:10.1073/pnas.97.13.7360 Medline
3. H. Tang, D. O. Siegmund, P. Shen, P. J. Oefner, M. W. Feldman, Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition. Genetics 161, 447–459 (2002). Medline
4. M. F. Hammer, A recent common ancestry for human Y chromosomes. Nature 378, 376–378 (1995). doi:10.1038/378376a0 Medline
5. F. Cruciani, B. Trombetta, A. Massaia, G. Destro-Bisol, D. Sellitto, R. Scozzari, A revised root for the human Y chromosomal phylogenetic tree: The origin of patrilineal diversity in Africa. Am. J. Hum. Genet. 88, 814–818 (2011). doi:10.1016/j.ajhg.2011.05.002 Medline
6. M. Ingman, H. Kaessmann, S. Pääbo, U. Gyllensten, Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713 (2000). doi:10.1038/35047064 Medline
7. R. L. Cann, M. Stoneking, A. C. Wilson, Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987). doi:10.1038/325031a0 Medline
8. P. A. Underhill, T. Kivisild, Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu. Rev. Genet. 41, 539–564 (2007). doi:10.1146/annurev.genet.41.110306.130407 Medline
9. M. A. Jobling, C. Tyler-Smith, The human Y chromosome: An evolutionary marker comes of age. Nat. Rev. Genet. 4, 598–612 (2003). doi:10.1038/nrg1124 Medline
10. H. Skaletsky, T. Kuroda-Kawaguchi, P. J. Minx, H. S. Cordum, L. Hillier, L. G. Brown, S. Repping, T. Pyntikova, J. Ali, T. Bieri, A. Chinwalla, A. Delehaunty, K. Delehaunty, H. Du, G. Fewell, L. Fulton, R. Fulton, T. Graves, S. F. Hou, P. Latrielle, S. Leonard, E. Mardis, R. Maupin, J. McPherson, T. Miner, W. Nash, C. Nguyen, P. Ozersky, K. Pepin, S. Rock, T. Rohlfing, K. Scott, B. Schultz, C. Strong, A. Tin-Wollam, S. P. Yang, R. H. Waterston, R. K. Wilson, S. Rozen, D. C. Page, The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825–837 (2003). doi:10.1038/nature01722 Medline
11. Materials and methods are available as supplementary material on Science Online.
12. ISOGG, International Society of Genetic Genealogy (2013) (available at http://www.isogg.org/).
13. P. A. Underhill, G. Passarino, A. A. Lin, P. Shen, M. Mirazón Lahr, R. A. Foley, P. J. Oefner, L. L. Cavalli-Sforza, The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65, 43–62 (2001). doi:10.1046/j.1469-1809.2001.6510043.x Medline
40
14. W. Wei, Q. Ayub, Y. Chen, S. McCarthy, Y. Hou, I. Carbone, Y. Xue, C. Tyler-Smith, A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Res. 23, 388–395 (2013). doi:10.1101/gr.143198.112 Medline
15. J. Z. Li, D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann, G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, R. M. Myers, Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008). doi:10.1126/science.1153717 Medline
16. T. M. Karafet, F. L. Mendez, M. B. Meilerman, P. A. Underhill, S. L. Zegura, M. F. Hammer, New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 18, 830–838 (2008). doi:10.1101/gr.7172008 Medline
17. J. F. Hughes, H. Skaletsky, T. Pyntikova, T. A. Graves, S. K. van Daalen, P. J. Minx, R. S. Fulton, S. D. McGrath, D. P. Locke, C. Friedman, B. J. Trask, E. R. Mardis, W. C. Warren, S. Repping, S. Rozen, R. K. Wilson, D. C. Page, Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463, 536–539 (2010). doi:10.1038/nature08700 Medline
18. R. C. Griffiths, S. Tavaré, Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. London B Biol. Sci. 344, 403–410 (1994). doi:10.1098/rstb.1994.0079 Medline
19. T. Goebel, M. R. Waters, D. H. O’Rourke, The late Pleistocene dispersal of modern humans in the Americas. Science 319, 1497–1502 (2008). doi:10.1126/science.1153569 Medline
20. M. C. Dulik, S. I. Zhadanov, L. P. Osipova, A. Askapuli, L. Gau, O. Gokcumen, S. Rubinstein, T. G. Schurr, Mitochondrial DNA and Y chromosome variation provides evidence for a recent common ancestry between Native Americans and Indigenous Altaians. Am. J. Hum. Genet. 90, 229–246 (2012). doi:10.1016/j.ajhg.2011.12.014 Medline
21. Y. Xue, Q. Wang, Q. Long, B. L. Ng, H. Swerdlow, J. Burton, C. Skuce, R. Taylor, Z. Abdellah, Y. Zhao, D. G. MacArthur, M. A. Quail, N. P. Carter, H. Yang, C. Tyler-Smith; Asan, Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Curr. Biol. 19, 1453–1457 (2009). doi:10.1016/j.cub.2009.07.032 Medline
22. R. G. Klein, Out of Africa and the evolution of human behavior. Evol. Anthropol. 17, 267–281 (2008). doi:10.1002/evan.20181
23. S. Kumar, C. Bellis, M. Zlojutro, P. E. Melton, J. Blangero, J. E. Curran, Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins. BMC Evol. Biol. 11, 293 (2011). doi:10.1186/1471-2148-11-293 Medline
24. S. Y. W. Ho, M. J. Phillips, A. Cooper, A. J. Drummond, Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22, 1561–1568 (2005). doi:10.1093/molbev/msi145 Medline
41
25. B. M. Henn, C. R. Gignoux, M. W. Feldman, J. L. Mountain, Characterizing the time dependency of human mitochondrial DNA mutation rate estimates. Mol. Biol. Evol. 26, 217–230 (2009). doi:10.1093/molbev/msn244 Medline
26. D. R. Bentley, S. Balasubramanian, H. P. Swerdlow, G. P. Smith, J. Milton, C. G. Brown, K. P. Hall, D. J. Evers, C. L. Barnes, H. R. Bignell, J. M. Boutell, J. Bryant, R. J. Carter, R. Keira Cheetham, A. J. Cox, D. J. Ellis, M. R. Flatbush, N. A. Gormley, S. J. Humphray, L. J. Irving, M. S. Karbelashvili, S. M. Kirk, H. Li, X. Liu, K. S. Maisinger, L. J. Murray, B. Obradovic, T. Ost, M. L. Parkinson, M. R. Pratt, I. M. Rasolonjatovo, M. T. Reed, R. Rigatti, C. Rodighiero, M. T. Ross, A. Sabot, S. V. Sankar, A. Scally, G. P. Schroth, M. E. Smith, V. P. Smith, A. Spiridou, P. E. Torrance, S. S. Tzonev, E. H. Vermaas, K. Walter, X. Wu, L. Zhang, M. D. Alam, C. Anastasi, I. C. Aniebo, D. M. Bailey, I. R. Bancarz, S. Banerjee, S. G. Barbour, P. A. Baybayan, V. A. Benoit, K. F. Benson, C. Bevis, P. J. Black, A. Boodhun, J. S. Brennan, J. A. Bridgham, R. C. Brown, A. A. Brown, D. H. Buermann, A. A. Bundu, J. C. Burrows, N. P. Carter, N. Castillo, M. Chiara E Catenazzi, S. Chang, R. Neil Cooley, N. R. Crake, O. O. Dada, K. D. Diakoumakos, B. Dominguez-Fernandez, D. J. Earnshaw, U. C. Egbujor, D. W. Elmore, S. S. Etchin, M. R. Ewan, M. Fedurco, L. J. Fraser, K. V. Fuentes Fajardo, W. Scott Furey, D. George, K. J. Gietzen, C. P. Goddard, G. S. Golda, P. A. Granieri, D. E. Green, D. L. Gustafson, N. F. Hansen, K. Harnish, C. D. Haudenschild, N. I. Heyer, M. M. Hims, J. T. Ho, A. M. Horgan, K. Hoschler, S. Hurwitz, D. V. Ivanov, M. Q. Johnson, T. James, T. A. Huw Jones, G. D. Kang, T. H. Kerelska, A. D. Kersey, I. Khrebtukova, A. P. Kindwall, Z. Kingsbury, P. I. Kokko-Gonzales, A. Kumar, M. A. Laurent, C. T. Lawley, S. E. Lee, X. Lee, A. K. Liao, J. A. Loch, M. Lok, S. Luo, R. M. Mammen, J. W. Martin, P. G. McCauley, P. McNitt, P. Mehta, K. W. Moon, J. W. Mullens, T. Newington, Z. Ning, B. Ling Ng, S. M. Novo, M. J. O’Neill, M. A. Osborne, A. Osnowski, O. Ostadan, L. L. Paraschos, L. Pickering, A. C. Pike, A. C. Pike, D. Chris Pinkard, D. P. Pliskin, J. Podhasky, V. J. Quijano, C. Raczy, V. H. Rae, S. R. Rawlings, A. Chiva Rodriguez, P. M. Roe, J. Rogers, M. C. Rogert Bacigalupo, N. Romanov, A. Romieu, R. K. Roth, N. J. Rourke, S. T. Ruediger, E. Rusman, R. M. Sanches-Kuiper, M. R. Schenker, J. M. Seoane, R. J. Shaw, M. K. Shiver, S. W. Short, N. L. Sizto, J. P. Sluis, M. A. Smith, J. Ernest Sohna Sohna, E. J. Spence, K. Stevens, N. Sutton, L. Szajkowski, C. L. Tregidgo, G. Turcatti, S. Vandevondele, Y. Verhovsky, S. M. Virk, S. Wakelin, G. C. Walcott, J. Wang, G. J. Worsley, J. Yan, L. Yau, M. Zuerlein, J. Rogers, J. C. Mullikin, M. E. Hurles, N. J. McCooke, J. S. West, F. L. Oaks, P. L. Lundberg, D. Klenerman, R. Durbin, A. J. Smith, Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). doi:10.1038/nature07517 Medline
27. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). doi:10.1093/bioinformatics/btp324 Medline
28. A. Wysoker, K. Tibbetts, T. Fennell, Picard (2009) (available at http://picard.sourceforge.net/).
29. A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). doi:10.1101/gr.107524.110 Medline
42
30. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A. Philippakis, G. del Angel, M. A. Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M. Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel, D. Altshuler, M. J. Daly, A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). doi:10.1038/ng.806 Medline
31. K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011). doi:10.1093/molbev/msr121 Medline
32. A. Kloss-Brandstätter, D. Pacher, S. Schönherr, H. Weissensteiner, R. Binna, G. Specht, F. Kronenberg, HaploGrep: A fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25–32 (2011). doi:10.1002/humu.21382 Medline
33. M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009). doi:10.1002/humu.20921 Medline
34. D. M. Behar, M. van Oven, S. Rosset, M. Metspalu, E. L. Loogväli, N. M. Silva, T. Kivisild, A. Torroni, R. Villems, A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012). doi:10.1016/j.ajhg.2012.03.002 Medline
35. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004). doi:10.1093/nar/gkh340 Medline
36. J. A. Rice, Mathematical Statistics and Data Analysis (Brooks/Cole, Belmont, CA, ed. 3rd, 2006), p. 166.
37. J. Hayya, D. Armstrong, N. Gressis, A note on the ratio of two normally distributed variables. Manage. Sci. 21, 1338–1341 (1975). doi:10.1287/mnsc.21.11.1338
38. J. F. C. Kingman, in Exchangeability in Probability and Statistics, G. Koch, F. Spizzichino, Eds. (North-Holland, Amsterdam, 1982), pp. 97–112.
39. C. de Filippo, C. Barbieri, M. Whitten, S. W. Mpoloka, E. D. Gunnarsdóttir, K. Bostoen, T. Nyambe, K. Beyer, H. Schreiber, P. de Knijff, D. Luiselli, M. Stoneking, B. Pakendorf, Y-chromosomal variation in sub-Saharan Africa: Insights into the history of Niger-Congo groups. Mol. Biol. Evol. 28, 1255–1269 (2011). doi:10.1093/molbev/msq312 Medline
40. S. Rootsi, N. M. Myres, A. A. Lin, M. Järve, R. J. King, I. Kutuev, V. M. Cabrera, E. K. Khusnutdinova, K. Varendi, H. Sahakyan, D. M. Behar, R. Khusainova, O. Balanovsky, E. Balanovska, P. Rudan, L. Yepiskoposyan, A. Bahmanimehr, S. Farjadian, A. Kushniarevich, R. J. Herrera, V. Grugni, V. Battaglia, C. Nici, F. Crobu, S. Karachanak, B. Hooshiar Kashani, M. Houshmand, M. H. Sanati, D. Toncheva, A. Lisa, O. Semino, J. Chiaroni, J. Di Cristofaro, R. Villems, T. Kivisild, P. A. Underhill, Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur. J. Hum. Genet. 20, 1275–1282 (2012). doi:10.1038/ejhg.2012.86 Medline
43
41. J. Chiaroni, P. A. Underhill, L. L. Cavalli-Sforza, Y chromosome diversity, human expansion, drift, and cultural evolution. Proc. Natl. Acad. Sci. U.S.A. 106, 20174–20179 (2009). doi:10.1073/pnas.0910803106 Medline
42. A. S. Hinrichs, D. Karolchik, R. Baertsch, G. P. Barber, G. Bejerano, H. Clawson, M. Diekhans, T. S. Furey, R. A. Harte, F. Hsu, J. Hillman-Jackson, R. M. Kuhn, J. S. Pedersen, A. Pohl, B. J. Raney, K. R. Rosenbloom, A. Siepel, K. E. Smith, C. W. Sugnet, A. Sultan-Qurraie, D. J. Thomas, H. Trumbower, R. J. Weber, M. Weirauch, A. S. Zweig, D. Haussler, W. J. Kent, The UCSC Genome Browser Database: Update 2006. Nucleic Acids Res. 34, D590–D598 (2006). doi:10.1093/nar/gkj144 Medline
43. W. J. Kent, BLAT: The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002). Medline
44. M. R. Waters, S. L. Forman, T. A. Jennings, L. C. Nordt, S. G. Driese, J. M. Feinberg, J. L. Keene, J. Halligan, A. Lindquist, J. Pierson, C. T. Hallmark, M. B. Collins, J. E. Wiederhold, The Buttermilk Creek complex and the origins of Clovis at the Debra L. Friedkin site, Texas. Science 331, 1599–1603 (2011). doi:10.1126/science.1201855 Medline
45. T. D. Dillehay, C. Ramírez, M. Pino, M. B. Collins, J. Rossen, J. D. Pino-Navarro, Monte Verde: Seaweed, food, medicine, and the peopling of South America. Science 320, 784–786 (2008). doi:10.1126/science.1156533 Medline
46. A. Scally, R. Durbin, Revising the human mutation rate: Implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012). doi:10.1038/nrg3295 Medline
47.1000 Genomes Project Consortium, An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). doi:10.1038/nature11632 Medline
48. E. Tamm, T. Kivisild, M. Reidla, M. Metspalu, D. G. Smith, C. J. Mulligan, C. M. Bravi, O. Rickards, C. Martinez-Labarga, E. K. Khusnutdinova, S. A. Fedorova, M. V. Golubenko, V. A. Stepanov, M. A. Gubina, S. I. Zhadanov, L. P. Ossipova, L. Damba, M. I. Voevoda, J. E. Dipierri, R. Villems, R. S. Malhi, Beringian standstill and spread of Native American founders. PLoS ONE 2, e829 (2007). doi:10.1371/journal.pone.0000829 Medline
49. F. L. Mendez, T. Krahn, B. Schrack, A. M. Krahn, K. R. Veeramah, A. E. Woerner, F. L. Fomine, N. Bradman, M. G. Thomas, T. M. Karafet, M. F. Hammer, An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree. Am. J. Hum. Genet. 92, 454–459 (2013). doi:10.1016/j.ajhg.2013.02.002 Medline
50. G. Destro-Bisol, F. Donati, V. Coia, I. Boschi, F. Verginelli, A. Caglià, S. Tofanelli, G. Spedini, C. Capelli, Variation of female and male lineages in sub-Saharan populations: the importance of sociocultural factors. Mol. Biol. Evol. 21, 1673–1682 (2004). doi:10.1093/molbev/msh186 Medline
51. A. M. Waterhouse, J. B. Procter, D. M. Martin, M. Clamp, G. J. Barton, Jalview Version 2: A multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009). doi:10.1093/bioinformatics/btp033 Medline
DOI: 10.1126/science.1242899, 465 (2013);341 Science
Rebecca L. CannY Weigh In Again on Modern Humans
This copy is for your personal, non-commercial use only.
clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others
here.following the guidelines
can be obtained byPermission to republish or repurpose articles or portions of articles
): August 7, 2013 www.sciencemag.org (this information is current as of
The following resources related to this article are available online at
http://www.sciencemag.org/content/341/6145/465.full.htmlversion of this article at:
including high-resolution figures, can be found in the onlineUpdated information and services,
http://www.sciencemag.org/content/341/6145/465.full.html#relatedfound at:
can berelated to this article A list of selected additional articles on the Science Web sites
http://www.sciencemag.org/content/341/6145/465.full.html#ref-list-1, 5 of which can be accessed free:cites 11 articlesThis article
registered trademark of AAAS. is aScience2013 by the American Association for the Advancement of Science; all rights reserved. The title
CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 465
PERSPECTIVES
frequencies. However, only one frequency
proportional to the input OAM will sur-
vive in the total power of the scattered light
when two opposite OAM waves are used as
input. This analysis indicates that a Doppler
shift should occur even with ordinary “non-
twisted” light as the input, if only specifi c
nonzero OAM components are detected in
the scattered light. This additional test was
carried out by Lavery et al., confi rming the
predictions.
The work of Lavery et al. builds on sev-
eral previous results. A form of Doppler
shift arising from the transverse
translational motion of a scattering
inhomogeneous surface is a well-
known effect, which provides the
basis for the so-called laser speckle
velocimetry that allows for noncon-
tact surface-speed measurements
( 7). The present study can be seen as
a generalization of speckle velocim-
etry to the rotational case. Various
examples of Doppler-like effects
in the interaction of light with spin-
ning particles or molecules have
been reported ( 8), but these effects arise
from SAM scattering, not OAM. Hence,
they require the particles to be anisotropic
(rather than inhomogeneous) and, because
SAM is bounded by ħ, cannot be enhanced
at will, in contrast to the case with OAM.
Finally, a Doppler effect relying on OAM
was demonstrated in an ad hoc setup involv-
ing transmission through spinning Dove
prisms, which are truncated prisms used to
invert and rotate images ( 9).
The rotational Doppler effect demon-
strated by Lavery et al. could fi nd applica-
tion in noncontact remote measurement
of angular speeds. Particularly fascinat-
ing would be the detection of astronomical
object rotations by fi ltering specifi c OAM
components in the detected radiation ( 10).
However, high-OAM components in the
light of distant sources are expected to be
strongly attenuated, so the potential in this
area will require further study.
References 1. M. P. J. Lavery, F. C. Speirits, S. M. Barnett, M. J. Padgett,
Science 341, 537 (2013). 2. A. M. Yao, M. J. Padgett, Adv. Opt. Photon. 3, 161 (2011). 3. M. J. Padgett, R. Bowman, Nat. Photon. 5, 343 (2011). 4. A. Ambrosio, L. Marrucci, F. Borbone, A. Roviello, P.
Maddalena, Nat. Commun. 3, 989 (2012). 5. N. Bozinovic et al., Science 340, 1545 (2013). 6. V. D’Ambrosio et al., Nat. Commun. 3, 961 (2012). 7. T. Asakura, N. Takai, Appl. Phys. 25, 179 (1981). 8. B. I. Bialynicki, B. Z. Bialynicka, in The Angular Momentum
of Light, D. L. Andrews, M. Babiker, Eds. (Cambridge Univ. Press, Cambridge, 2012).
9. J. Courtial, D. Robertson, K. Dholakia, L. Allen, M. Padgett, Phys. Rev. Lett. 81, 4828 (1998).
10. F. Tamburini, B. Thidé, G. Molina-Terriza, G. Anzolin, Nat.
Phys. 7, 195 (2011).
10.1126/science.1242097
The age of the most recent man or
woman from whom all living humans
today descended has been the subject
of considerable debate. It has been suggested
that the date of our last common maternal
ancestor could have be three times older than
that of our last common paternal ancestor.
Two papers in this issue independently redate
our most recent common paternal ancestor
and fi nd that there is rather little or no dis-
parity with the age our common maternal
ancestor. On page 565, Francalacci et al. ( 1)
report their high resolution sampling of
1204 Sardinian men, yielding 11,763 phylo-
genetically informative and male-specific
single-nucleotide Y-chromosome polymor-
phisms (MSY-SNPs), and generate a puta-
tive estimate of 180,000 to 200,000 years
for the point at which all these and other
human paternal lineages coalesce. In a sep-
arate study on page 562, Poznik et al. ( 2)
detail their methods using sequences from
69 males drawn from nine populations, cov-
ering 9.99 million loci on the Y, and con-
clude that the most recent common pater-
nal ancestor lived 120,000 to 156,000 years
ago. These papers further confi rm an earlier
sequencing study ( 3) of 36 male donors that
pushed the ancestral Y back to 115,000 years
before present (yr B.P.), using almost 6800
variants shared by two or more men. This is
roughly the same as the dates derived on the
basis of mitochondrial genome analysis for
the most recent common maternal ancestor
( 4). So now it seems that a population giv-
ing rise to the strictly maternal and strictly
paternal portions of our genomes could have
produced individuals who found each other
in the same space and time.
While the papers of Francalacci et al.
and Poznik et al. are elegant, careful analy-
ses, the general public is more familiar with
mitochondrial and Y-chromosome analyses
in the context of population-based com-
parisons for assigning parentage or assess-
ing continental origin (so-called ancestry
Y Weigh In Again on Modern Humans
GENETICS
Rebecca L. Cann
Sampling of the human Y chromosome
eliminates the curious disparity in ages of
our last common male and female ancestors.
Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii at Manoa, 1960 East-West Road, Honolulu, HI 96822, USA. E-mail: [email protected]
Twisted light and the Doppler effect. Light carrying orbital angular momentum (OAM), repre-sented by its helical-structured wavefront in orange, is refl ected off a spinning disk. The disk’s sur-face roughness generates new OAM components in the scattered light. In this example, a single-helix wave with an OAM of ħ (one rotational quantum) is scattered into a triple-helix wave with an OAM of 3ħ. The scattered light then also acquires a Doppler frequency shift, represented here as a change of color to blue.
Published by AAAS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
2 AUGUST 2013 VOL 341 SCIENCE www.sciencemag.org 466
PERSPECTIVES
testing). Maternally (i.e., mitochondrial)
and paternally (male-specifi c regions of the
Y) inherited genomes are used in ances-
try testing because they present such sim-
ple models of genetic inheritance. But they
can sometimes oversimplify the search for
a genetic homeland and family tree. When
using Y-chromosome markers, a grandfa-
ther to father to son pattern of inheritance
can be easily reconstructed, both solving and
raising issues regarding paternity, as seen in
the case of the Jefferson family and Sally
Hemings’s descendants ( 5). But SNP mark-
ers can undergo a type of genetic recombi-
nation called gene conversion that compli-
cates simple Y-chromosome typing and the
estimation of mutation rates, leading to spu-
rious conclusions when extrapolated over
generations. For example, Niederstätter et
al. ( 6) recently found that efforts to look at
bio-geographical origins were complicated
by the fact that changes in a 37–base pair
region of the Y chromosome occurred more
often among thousands of Austrian men
than could be accounted for by simple step-
wise mutations.
For most biologists, the analysis of
SNPs simply provides evidence of popula-
tion subdivision in the branching patterns
of our long-dead ancestors, and this can
offer an overwhelming sense of our geo-
graphical roots that some will fi nd appeal-
ing. However, for social scientists pondering
the social consequences of such disclosures
surrounding biological diversity in humans,
there can be instant recoil at past misguided
efforts to use genetics to justify racism.
While some have looked at genetic basis
of disease susceptibility in the context of
migrations of human populations ( 7), there
is always the danger of confusing the effects
of selection driven by the environment com-
pared to the genetic history of the popula-
tions in question. Indeed, some researchers
have concluded that human racial classifi ca-
tion is a continuing social construct and not
a biological reality at all ( 8).
On a grander scale of history, discov-
ery bias has been a consistent problem in
using SNPs ( 9), or paleoanthropology, to
reconstruct the past. So it is good news that
these two new papers provide fresh evi-
dence, using between them a diverse set
of data that will allow the consideration of
alternative demographic models of hominin
migration. The idea that culturally modern
humans pulsed out of Africa only 50,000
to 60,000 years ago ( 10) is widely promul-
gated, although earlier proposed calibra-
tions of mutation rates for the Y chromo-
some ( 11), as well as whole-genome analy-
sis of mitochondrial DNA ( 12), were con-
sistent with other models of more ancient
possible migrations involving anatomically
modern humans. Eventual ecological dis-
placement of temperate zone archaic pop-
Neandertals
A
B
Denisovians?
Unoccupied
Indian archaics
Modern humans
African archaics?
NeandertalsDenisovians?
Modern humans
Modern humans
Abandoned
African archaics?
Ancestors on the move. Using Y-chromosome data from modern human males to date the most recent common paternal ancestor will ultimately help to constrain demographic models of past hominin pop-ulation locations and migrations. (A) Possible distri-butions of different populations at ~190,000 B.P. As described in the text, Y-chromosome analyses might help explain the timing of cultural changes seen with regard to microblade tool use and the disappearance of archaic forms of hominins from India, generating the distribution seen at 71,000 B.P. (B). [Adapted from (13)]
Published by AAAS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
www.sciencemag.org SCIENCE VOL 341 2 AUGUST 2013 467
PERSPECTIVES
ulations, both Neandertal and Denisovan,
by modern humans has been put forward to
account for discordancy in tool traditions
from archaeological research at sites in
central India ( 13), where microblade tech-
nology had been in use only since 45,000
years ago, as compared to sites in Africa,
China, and Malaysia. In these latter loca-
tions, the hominin populations appeared
to be culturally modern (in terms of the
tools they were making) and well adapted
to the emerging localities, suggesting that
the time frames implied by a short Y chro-
nology allows insufficient time for inva-
sion and settlement. More broadly, marine
isotope 5–calibrated dates in the range of
85,000 to 130,000 yr B.P. suggest that mod-
ern humans were on the move between trop-
ical and subtropical zones during the peri-
ods when climate oscillated in the temper-
ate regions they would later successfully
reinvade, leaving us as their legacy.
References 1. P. Francalacci et al., Science 341, 565 (2013).
2. G. David Poznik et al., Science 341, 562 (2013).
3. W. Wei et al., Genome Res. 23, 388 (2013).
4. R. L. Cann, M. Stoneking, A. C. Wilson, Nature 325, 31
(1987).
5. E. A. Foster et al., Nature 396, 27 (1998).
6. H. Niederstätter et al., Forensic Sci. Int. Genet.
10.1016/j.fsigen.2013.05.010 (2013).
7. E. Corona et al., PLoS Genet. 9, e1003447 (2013).
8. E. Bonilla-Silva, Racism Without Racists: Color-Blind Rac-
ism and the Persistence of Racial Inequality in the United
States (Rowman and Littlefi eld, Lanham, MD 2006)
9. A. Albrechtsen, F. C. Nielsen, R. Nielsen, Mol. Biol. Evol. 27,
2534 (2010).
10. R. Klein, The Human Career: Human Biological and
Cultural Origins, (Univ. of Chicago Press, Chicago, ed.3.,
2009)
11. F. Cruciani et al., Am. J. Hum. Genet. 88, 814 (2011).
12. A. Olivieri et al., Science 314, 1767 (2006).
13. S. Mishra, N. Chauhan, A. Singhvi, PLoS ONE 8, e69280
(2013).
Our understanding of the forms,
functions, and movement of RNA
continues to expand. Not only
can RNA control gene expression by mul-
tiple mechanisms within a cell, it appears
to travel outside the cell within an organism
as well. This raises the interesting question
of whether the RNA world extends beyond
the boundaries of the organism. Can RNA
traffi c integrate an organism into its envi-
ronment—is there “social RNA”? Exam-
ining the mechanism of RNA interference
(RNAi) may be a good route for seeking the
answer.
In many eukaryotic cells, exposure to
double-stranded RNA (dsRNA) can initi-
ate an RNAi response that generates small
interfering RNA (siRNA). These are potent
silencing molecules that use
base-pairing to recognize genes
with sequence similarity to the
original double-stranded trigger
( 1, 2). Moreover, many organ-
isms, including mustard cress
and roundworms, possess mech-
anisms to move siRNAs between
tissues ( 3, 4). So far, research into the func-
tions of RNAi has focused on its role within
an organism—in antiviral defense or in
silencing repetitive DNA sequences in the
genome, for example. In the model nema-
tode Caenorhabditis elegans, however,
molecular mechanisms facilitate traffi cking
of functional RNA to and from cells. This
extends the RNAi response outside of the
cell and possibly even outside of the organ-
ism. The functional importance of either for
C. elegans in the wild is still unknown. How-
ever, successfully investigating such roles
could be achieved by analyzing nematodes
in their natural habitat, for which ecological
characterization is more advanced than for
the laboratory workhorse C. elegans.
One of the most remarkable features of
RNAi in C. elegans is that feeding these
animals dsRNA can silence endogenous
genes. This response differs from nonspe-
cifi c infl ammatory responses to dsRNA in
mammalian cells because only genes with
matching sequence to the ingested dsRNA
will be silenced ( 5, 6). A specifi c pathway
that takes up long dsRNA (~200
to 500 base pairs) from the gut
lumen involves the channel pro-
tein SID-2 ( 7, 8). Inside the cell,
dsRNA is cleaved by the endori-
bonuclease Dicer (DCR-1) to
generate ~23-nucleotide RNAs,
known as siRNAs. siRNAs are
bound by Argonaute proteins, and the result-
ing complex targets messenger RNA for
degradation. In addition, RNA-dependent
RNA polymerase enzymes amplify the trig-
ger, thereby bolstering its silencing effect
on target genes ( 9). Another dsRNA-selec-
tive channel, SID-1, subsequently allows the
silencing RNA signal to spread throughout
the animal ( 10). The signal can even reach
the germ line, thus instigating a transgenera-
tional response ( 11– 13).
Exploiting the SID pathway enables the
function of almost any gene to be examined
simply by making a bacterial strain express-
ing dsRNA that matches the gene of interest
—a great tool. But the broad implications of
the response that this pathway elicits have
not resonated widely, in part because its
function in the normal C. elegans life cycle
is mysterious. What possible use could there
be for a pathway that takes up RNA from the
environment and uses it to silence endoge-
nous genes?
An attractively simple idea is that C. ele-
gans might respond to dsRNA that is natu-
rally produced by the bacteria it consumes.
This would allow C. elegans to mount an
RNAi response against bacterial RNAs that
enter the gut. siRNAs produced by cleav-
age of a bacterial dsRNA trigger could tar-
get endogenous genes, redirecting gene
expression programs in response to differ-
ent diets. Although a plausible model, there
is no clear evidence yet that this occurs. A
noncoding RNA produced by certain Esch-
erichia coli strains might cause gene expres-
sion changes via RNAi in C. elegans ( 14).
However, mutations in SID pathway genes
do not obviously compromise fi tness under
laboratory conditions, which suggests that
“environmental RNAi” is not important for
growth on E. coli in general. In the wild, C.
elegans probably feeds on bacteria growing
on rotting fruit ( 15) and therefore encoun-
ters multiple species of microbes, so deeper
sampling of ecologically relevant bacteria
might provide insight into the role of SID-
encoding genes. Our understanding of the
Is There Social RNA?
MOLECULAR BIOLOGY
Peter Sarkies and Eric A. Miska
The idea that RNA can be transferred between
organisms and function in communication and
environmental sensing is discussed.
Gurdon Institute and Department of Biochemistry, Uni-versity of Cambridge, Cambridge CB2 1QN, UK. E-mail: [email protected]
Onlinesciencemag.org
Podcast interview with author Eric
Miska (http://scim.ag/ed_6145).
10.1126/science.1242899
Published by AAAS
on
Aug
ust 7
, 201
3w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from