supplemetary material - university of arizonaflmendez/papers/mendez_2012a_suppl.pdfsupplemetary...
Post on 19-Aug-2020
6 Views
Preview:
TRANSCRIPT
SUPPLEMETARY MATERIAL
Supplementary Methods
DNA sequencing
DNA samples were PCR amplified and Sanger sequenced on an ABI 3730XL DNA analyzer.
Primers were designed using the genome reference sequence in Oligo Primer Analysis Software.
Amplification primers are listed in table S2. Sequences were finished using the
Phred/Phrap/Consed/PolyPhred suit of programs. The ancestral state in humans was inferred
using chimpanzee, bonobo and gorilla as outgroups (table S3).
In addition to the 5 individuals listed in the Materials and Methods in the Melanesian
panel, 11 individuals from Melanesian populations (6 from Papua New Guinea, 1 from New
Britain, and 4 from Bougainville) and 2 individuals from sub-Saharan Africa (1 San and 1 from
Ghana) were re-sequenced for the extended region spanning positions 111820038-111842221.
Four individuals from Bougainville and two Papua New Guineans are part of the HGDP panel,
three Papua New Guineans and the San individual are shared with the diversity panel, and the
remaining individuals were previously included in studies of Y chromosome diversity
(Scheinfeldt et al. 2006; Wilder and Hammer 2007).
Correcting Allele Frequencies
To correct for uneven sampling of SNPs on the backgrounds of the deep and shallow lineages
associated with the ancestral state at 111829579 (B1 and B2) and the derived states at 111839753
(A1 and A2), we carried out the following computation for the frequency of A1 and its variance.
The frequency of A1 can be calculated using the law of total probability.
From the conditional variance formula we obtain
Estimates for frequencies and variances were obtained by using sample proportions from the
data.
Coalescent simulations
Intra-allelic variation in the two newly described Melanesian lineages (deep and shallow) was
studied implementing coalescent simulations for the sequence spanning 111822001-111842221
using the program ms (Hudson 2002). Mutation rates were estimated using polymorphisms in the
human lineage and assuming a divergence time of 6 Mya (i.e., 240,000 generations ago) between
human and chimpanzee sequences. The parameters for the population size correspond to the
number of chromosomes in the population with a given allele. Five classes of demographic
model were simulated with 50,000 simulations for each of several parameter values: a) constant
population size, b) exponential growth from constant population size, c) population crash
followed by exponential growth, d) exponential growth from a single chromosome, and e)
initially as in b), but the population of chromosomes has reached a stable size (table S7a, fig.
S5). For each simulation with a frequency spectrum matching the empirical data, we kept the
coalescent time for the sample. The collection of coalescent times was used as a distribution of
estimates for the time to the most recent common ancestor (TMRCA) to calculate medians and
confidence intervals.
Phase, Network and Recombination Rate
We inferred phase using the program PHASE v2.1 in a run of 500,000 steps following 50,000
steps of burn-in. Recombination rates were inferred using the same program with a run of
100,000 steps following 10,000 steps of burn-in. The data were phased in sets of amplicons and
with all amplicons together. Phased haplotypes in amplicon set 2 were trimmed after site
111841590 to reduce the effects of recombination on the fifth amplicon. Recombination rates
were estimated using Yoruban trio genotypes from HapMap phase II. SNPs that resulted in
Mendelian violations were discarded.
Divergence time
Under the infinite sites mutation model, the number of mutations in a branch of a genealogy
follows a Poisson distribution. The mean number of mutations is the product of the branch length
and the mutation rate. Therefore, given the number of uniformly ascertained mutations, the
number of mutations in each branch follows a multinomial distribution with parameters
proportional to the branch lengths. Maximum likelihood estimates for the branch lengths and
confidence intervals for individual relative branch lengths can be obtained from the distribution
of mutations in the genealogy (Mendez et al. 2011). Estimates of the relative branch lengths can
be scaled if a calibration point is known. In this work the calibration value was the divergence
time with the chimpanzee sequence, which was assumed to be 6 Mya.
Two different modeling schemes were used: 1) only mutations leading to a specific
lineage were considered, and the remaining lineages were used to divide those mutations into
temporally distinct groups; 2) the TMRCA of two lineages was evaluated by considering the
three branches (those of each lineage and the one spanning from the ancestor with chimpanzee to
their common ancestor) and constraining both lineages to have the same branch length. It should
be noted that the parameter a associated with the fraction of the mutations that are segregating is
different in the two schemes. In the first scheme, a is the ratio of the divergence time of human
lineages to the time of the human-chimp divergence. In the second scheme, measured in terms
of the human-chimp divergence time, the total tree length is 1+a (i.e., the second human lineage
adds an extra amount, a, to the total tree length). The total tree length associated with the two
human lineages is 2a (i.e., a for each individual). Thus, the fraction of the total tree length
associated with the human lineages is 2a/(1+a).
Supplementary Results
Analysis of the fourth and fifth amplicons showed that while the “deep” lineage still persists in
the 3’ section of the gene, the relationship among haplotypes in this region is somewhat more
complex than in the 5’ end. We focused our analysis in the main text on the 5’ region, which
encompasses the 1st to 3rd exons. Here we provide further analysis and discussion of the 3’
region of the OAS1 gene, as well as characterize putative recombinant and gene-converted
chromosomes.
Levels and Patterns of Diversity in the 3’ region of OAS1
High levels of polymorphism in Papuans extend to the region encompassing exons 4-6,
especially in the fourth amplicon (table 1, table S8). A median-joining network of phased
haplotypes for this sequence, trimmed to reduce the effect of recombination, defines two groups
of haplotypes. One group has haplotypes from all populations while the second group has
haplotypes only from San, Mandenka, and Papuans (fig. S2). Of the two Papuan haplotypes
observed in the second group, one corresponds to the deep lineage haplotype in the region
encompassing exons 1-3. The other Papuan haplotype shows affinities with a haplotype observed
in the San. These haplotypes are referred to as Papuan and African “shallow” lineages,
respectively.
To better characterize the relationship between the Papuan and African shallow lineages
and to assess the intra-allelic variation in the Papuan shallow lineage, we also sequenced the
extended region in five homozygous Papuans and two (i.e., one homozygote and one
heterozygote) Africans carrying the shallow lineage (table S4, fig. 1). The estimated divergence
time between the Papuan and African shallow lineages is 240 kya (65 – 613 kya, 95% CI). The
Papuan shallow lineage is restricted to populations of Melanesian ancestry (fig. S3a). While the
African shallow lineage is found at very low frequency in several sub-Saharan African
populations (fig. S3b, table S5b), it is found at moderate frequency only in the KhoeSan and
Mbuti Pygmies.
Analysis of the level of intra-allelic variation using data from the extended re-sequencing
of the Melanesian panel, and consideration of a variety of possible population histories for the
ancestry of these chromosomes, provide an estimated TMRCA for the deep lineage of 24 kya (8
– 66 kya, 95% CI). With the same set of demographic models used for the deep lineage, the
estimate of the TMRCA in Papuans for the shallow lineage is 19 kya (7 – 58 kya, 95% CI) (table
S7b). These two estimates show a broad overlap.
In sum, we note that toward the 3’ end of the gene Papuans still maintain high genetic
diversity, which is due in part to the presence of a second lineage that is not present in Eurasians.
Because its estimated TMRCA is similar to that of the deep lineage, the possibility arises that
both lineages diversified in Oceania as part of the same demographic process.
Analysis of high density SNP data
Using the Illumina 650Y genotype data of the individual HGDP00555, known to be homozygous
for the deep lineage (table S4), it was possible to extract a haplotype spanning between positions
111820114 and 111906360 that is exclusive of the deep lineage. The search of this haplotype
against phased genotypes of samples in the Human Genome Diversity Project (Li et al. 2008;
Pickrell et al. 2009) retrieved 5 chromosomes from Oceania, together with two chromosomes
from Pakistan. The inferred haplotype for the ancient Denisova sequence is also consistent with
this deep haplotype with the exception of the SNP at position 111820114. The two Pakistani
individuals are heterozygous for the SNP at position 111828579 and derived at 111831807, and
the retrieved haplotypes for the two Pakistanis HGDP00052 and HGDP00078 agree with a deep
lineage haplotype from Bougainville in HGDP00979 between positions 111757914 and
111931486.
Recombination and gene conversion
One individual from Papua New Guinea and one site were excluded from the first network due to
recombination involving the deep lineage and a suspected gene conversion, respectively.
Extended re-sequencing was performed on the excluded individual, a second individual from
New Britain also carrying a recombinant haplotype for the deep lineage, and four individuals
with a suspected independent event of gene conversion involving the same candidate site (table
S4). The two recombinant chromosomes were produced by independent events: the two
recombinant chromosomes have a different breakpoint for recombination (see position
111832263 in table S4) and the chromosomes with which they recombined are different (see
position 111833253 in table S3). The four individuals with the suspected gene converted event
(HGDP00663, HGDP00788, HGDP00789 and HGDP00824) agree with the deep lineage at the
site excluded in the network of amplicon set 1 and at a neighboring in-del (table S4). Outside
those positions these chromosomes agree with non-converted chromosomes (as indicated by the
presence of the derived state at positions 111826545 and 111828555).
Functional variation
Non-synonymous mutations in the coding sequence for positions prior to 111839889 (core
sequence) affect the protein sequence in all known functional splicing variants of the gene OAS1.
We have analyzed all amino acid changes in the different lineages since the most common
ancestor with chimpanzee using Polyphen 2 (http://genetics.bwh.harvard.edu/pph2/index.shtml)
in the order on which they are inferred to have occurred. Whenever multiple single amino acid
mutations connect two known sequences, we have considered all the possible paths. The
functional effects of frameshifts and polymorphic change in splicing were excluded, because
they involve simultaneous changes in several amino acids. Likewise, recombinants changing
multiple amino acids simultaneously were excluded from our analysis.
All humans amino acid sequences share two mutations in the core sequence since the
MRCA with chimpanzee: D127G and D166N. Both mutations are predicted as benign. Another
mutation, in position D350N, affecting only one splicing variant, is also predicted as benign.
With the exception of the deep lineage, modern human sequences share the mutation Y179D,
also predicted as benign in all splicing variants.
The mutation G111839925A in the lineage shared by HGDP01029, JR020 and JR354
results in an A359T mutation, predicted as benign, in the protein sequence the p42 splicing
variant only (fig. 1). G to A mutations at rs1131476 and rs1051042 (position 111841599 and
111841792, respectively) result in A352T and R361T for the p46 transcript. This amino acid
sequence is observed only in people of African descent and in the Melanesian shallow lineage.
The deep lineage and Denisova have evolved from the ancestral sequence of all humans by the
inclusion of three mutations in the core sequence: R104G, P129R and E183D, with the deep
lineage (but not Denisova) including also a two-base-pair deletion at position 111841650 that
produces a frameshift in the sequence of a p46 isoform, shortening the amino acid sequence from
400 residues to 377 residues (molecular weight goes from 46.0 kDa to 43.4 kDa). Predictions for
the mutations R104G, P129R and E183D depend on the specific isoform, but the qualitative
impact is independent of the order of the mutations. The mutation R104G, which occurs in the
third alpha helix of the protein (Hartmann et al. 2003), has the largest effect. For the isoform
p42, none of the mutations is considered benign. A mutation that is not benign may only confer a
change in function, and not be deleterious. Considering the broad distribution of the deep lineage
it is likely that at least some of the functional mutations may be adaptive.
The isoform p48 originates from the (African) p46 by a transition G to A at position
111841576 (rs10774671). The derived A obliterates the splicing acceptor site, creating a weak
splicing acceptor site (sequence AAG) at 111841577 and enabling another weak splicing
acceptor site (sequence TAG) at position 111841674. The product of the transcript obtained from
the ancestral allele is usually called p46 (molecular weight 46.0 kDa), and the products of using
111841577 and 111841674 are called p52 (52.2 kDa) and p48 (molecular weight 47.5 kDa),
respectively. The last exons of these three transcripts have different reading frames, with p48
matching that of the deep lineage after the deletion; however the stop codon in the deep lineage
occurs at position 111841674. Three mutations occurring in the background of the derived state
for 111841576 affect the p42, p48 and p52 isoforms: G162S, T69N and R242Q. The
polymorphism G162S has been associated with susceptibility to Type1 diabetes (Tessier et al.
2006). It has been shown that the A allele at 111841576 increases the relative expression of p52
and p42, the latter probably by increasing the retention of the last intron (Lalonde et al. 2010).
The relative importance of the mutations described in the previous section is thus likely to be
related to the expression levels for the transcripts. There are two independent events of
recombination documented in this work between the deep lineage and chromosomes with the
derived state at 111841576. The effect of these recombination events on the amino acid sequence
is equivalent to performing the mutations R104G and P129R simultaneously.
Variance in TMRCA
In this section we explore more complicated models to explain the ancient TMRCA of the
second section within the extended sequence in Figure 1. To evaluate whether the 3.3 Mya
estimate for TMRCA is compatible with the relatively recent divergence time of ~0.3 Mya that
was inferred for the ancestors of Denisova and AMH (Reich et al. 2010), we first consider a
model of balancing selection acting on a SNP in the second section. This form of selection would
not be expected to prevent decay in LD around a single selected site. Performing an analysis of
LD as in the main text, we estimate a genetic map length of 0.0168 cM for the 6 kb of the second
section. A contour plot of the probability of maintenance of this haplotype in a model of
panmixia illustrates recombination would erode LD over the 6 kb (fig. S4). For example, with
the estimated values for recombination (0.0168 cM) and TMRCA (3.3 My) the probability of
observing the haplotype would be (2 x 10-10). When taking conservative estimates of 0.0084 cM
for the recombination rate and 1.7 My for the time over which recombination is detectable, the
probability of maintenance of the haplotype is still only 0.003 (fig. S4).
References
Hartmann, R., J. Justesen, S. N. Sarkar, G. C. Sen, and V. C. Yee. 2003. Crystal structure of the
2'-specific and double-stranded RNA-activated interferon-induced antiviral protein 2'-5'-
oligoadenylate synthetase. Mol Cell 12:1173-1185.
Hudson, R. R. 2002. Generating samples under a Wright-Fisher neutral model of genetic
variation. Bioinformatics 18:337-338.
Lalonde, E., K. C. Ha, Z. Wang, A. Bemmo, C. L. Kleinman, T. Kwan, T. Pastinen, and J.
Majewski. 2010. RNA sequencing reveals the role of splicing polymorphisms in
regulating human gene expression. Genome Res 21:545-554.
Li, J. Z., D. M. Absher, H. Tang, A. M. Southwick, A. M. Casto, S. Ramachandran, H. M. Cann,
G. S. Barsh, M. Feldman, L. L. Cavalli-Sforza, and R. M. Myers. 2008. Worldwide
human relationships inferred from genome-wide patterns of variation. Science 319:1100-
1104.
Mendez, F. L., T. M. Karafet, T. Krahn, H. Ostrer, H. Soodyall, and M. F. Hammer. 2011.
Increased resolution of Y chromosome haplogroup T defines relationships among
populations of the Near East, Europe, and Africa. Hum Biol 83:39-53.
Pickrell, J. K., G. Coop, J. Novembre, S. Kudaravalli, J. Z. Li, D. Absher, B. S. Srinivasan, G. S.
Barsh, R. M. Myers, M. W. Feldman, and J. K. Pritchard. 2009. Signals of recent positive
selection in a worldwide sample of human populations. Genome Res 19:826-837.
Reich, D., R. E. Green, M. Kircher, et al. (28 co-authors). 2010. Genetic history of an archaic
hominin group from Denisova Cave in Siberia. Nature 468:1053-1060.
Scheinfeldt, L., F. Friedlaender, J. Friedlaender, K. Latham, G. Koki, T. Karafet, M. Hammer,
and J. Lorenz. 2006. Unexpected NRY chromosome variation in Northern Island
Melanesia. Mol Biol Evol 23:1628-1641.
Tessier, M. C., H. Q. Qu, R. Frechette, F. Bacot, R. Grabs, S. P. Taback, M. L. Lawson, S. E.
Kirsch, T. J. Hudson, and C. Polychronakos. 2006. Type 1 diabetes and the OAS gene
cluster: association with splicing polymorphism or haplotype? J Med Genet 43:129-132.
Wilder, J. A., and M. F. Hammer. 2007. Extraordinary population structure among the Baining
of New Britain in J. S. Friedlaender, ed. Genes, Language, & Culture History in the
Southwest Pacific. Oxford University Press, New York.
Table S1. Coordinates of amplicons sequenced for the Diversity Panel
Amplicon beginning end length distance to the next amplicon
A1 111828677 111829834 1158 560
A2 111830395 111831525 1131 754
A3 111832280 111833723 1444 4536
A4 111838260 111840323 2064 647
A5 111840971 111842221 1251
Exons covereda beginning end length
1 111829122 111829407 286
2 111830724 111831012 289
3 111833239 111833423 185
4 111838697 111838926 230
5a 111839735 111839888 154
5b 111839735 111840214 480
6a 111841577 111842095 519
6b 111841578 111842095 518
6c 111841675 111842095 421a: alternative splicing variants are indicated with letters following the exon number
Table S2a. Primer information for amplicons re-sequenced in the Diversity Panel
Amplicon Upper Name Upper Seq Lower Name Lower Seq
OAS1A1 OAS1U28649 GAAAGGGAAAAAAGCATAGTATAATACC OAS1L29835 GAGGAAATTGGAACACAGAGTAGT
OAS1A2 OAS1U30370 GTAAGTGTGAACCACCCAGCATAAG OAS1L31526 TTTTTGAACACCTATTACTCATCAGAGC
OAS1A3 OAS1U32258 AAGACAAGAGGGAGAAGGCTGG OAS1L33724 GCGTGTGTGTATGTAGCATTGA
OAS1A4 OAS1U38235 GCATTTCTTAGGAACATTACAAGTC OAS1L40324 TTCACTATTTGGGCGACAGG
OAS1A5 OAS1U40899 TAAACAGCCTGCCTTGTCAC OAS1L42222 TATTCCCAGTGCCCAGAGC
Table S2b. Primer information for additional amplicons used in the extended resequencing
Amplicon Upper Name Upper Seq Lower Name Lower Seq
OAS1_20016 OAS1U20016 TGTGTAGATGCCCCATAGAGGA OAS1L21080 CAGAAACCAGAAAGGAAAACTGC
OAS1_21055 OAS1U21055 GAGCATCCAAGAAAACGAGTG OAS1L22562 ATCACAAGGCATCAACCAGG
OAS1_22181 OAS1U22181 GAGATTTCTTTCCCCACAGATTC OAS1L23724 ACCTCATCAAGCCAATGTCC
OAS1_23529 OAS1U23529 AAGTTGCTGAGGTCTGGTTTC OAS1L24717 CAAAAAGGTCTCGGTCTTCA
OAS1_24562 OAS1U24562 CTTTTGCTTGGCTCTTGTCC OAS1L26866 GTGGGGTGCTGTCTTTGC
OAS1_26789 OAS1U26789 TTTGCTTTATCATACTTGGC OAS1L27632 AACACTACTTTCACTACATCCC
OAS1_27602 OAS1U27602 AAAATGAAAAACAGCCTATCAAAAAG OAS1L29077 GCAAATCAGACACTCCCCTG
OAS1_29747 OAS1U29747 TAGGGGCTCACCATTTCTGC OAS1L30861 CTCTCTCTCTTTGACAGGCTTCC
OAS1_31409 OAS1U31409 CATTTGGACAGGAAGTGTAACC OAS1L32621 ATGGCTATCTATTGTTTCACCC
OAS1_33652 OAS1U33652 CACTGCTGTATCCCCAGAACT OAS1L35103 TGGCTATAAAACAATAATACTTCG
OAS1_34964 OAS1U34964 TTCTTTCTTGATGCTGTTCTCC OAS1L36082 CAGTGGTTTGAATGAGGACA
OAS1_35855 OAS1U35855 ATTTCTATTTCATATTTTTGTATCTGC OAS1L36264 TGGGGTGTGGCAAGGGT
OAS1_36075 OAS1U36075 CCTTTCCTGTCCTCATTCAAACC OAS1L37895 TATTGTGAAAATGACCATACTCCC
OAS1_37726 OAS1U37726 GAAGTCTGATAATGTAATGCCTC OAS1L38402 CGCTGGATTCTTATTGATGT
OAS1_40199 OAS1U40199 TAAATAGTCACAACAATCCCAT OAS1L40958 ACAACCCAAGTCACTCAGC
Table S2c. primer information for polymorphisms genotyped by RFLP
Positiona Upper Name Upper Seq Lower Name Lower Seq
111829579 OAS1dig1U GAGGGGTGGCTGAATGTG OAS1dig1L TCAAACAGTTACAGGGAGGAGAG
111831807 OAS1U31409 CATTTGGACAGGAAGTGTAACC OAS1L32621 ATGGCTATCTATTGTTTCACCC
111839753 OAS1U39625d TCTGAGTCCCAGTTCATCCC OAS1L39875d TCTCACCAGCAGAATCCAGGa All positions of polymorphisms are based on the 2006 built of the human genome (hg18)
Table S3. Polymorphism Table for five amplicons sequenced in the Diversity Panel a,b,c
111828752
111828852
111829011
111829013
111829473
111829492
111829550
111829579
111829639
111829676
111829698
111829736
111830504
111830510
111830749
111830853
111830929
111831400
111831465
111832329
111832673
111832690
111832718
111832895
111833010
111833037
111833044
111833131
111833232
111833233
111833253
111833304
111833318
111833482
111833544
111838514
111838673
111838767
111838813
111838936
111839151
111839308
111839658
111839753
111839925
111839966
111839984
111840068
111840237
111840318
111841149
111841283
111841305
111841318
111841363
111841391
111841413
111841458
111841576
111841585
111841592
111841599
111841620
111841650
111841792
111841825
111841877
111841948
111842046
111842071
111842091
111842150
111842166
Population sample name C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - G G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G ABIA HGDP00451 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00454 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00455 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00457 . . . C . ------ . T S . . C . M . . . T R . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A R C . Y A . Y A A . . .BIA HGDP00458 S . . C . ------ . T . . . C . . . . . T . . . . . . . Y . . . . . G . A . . X . . . . . . . . K . . T . . . T . T . . . A . A . C . . A . . A A . S .BIA HGDP00459 . . . C R ------ . T S . . C . . . . . T . . . . . . . Y . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00460 . . . C . ------ . T S . . C . . . . . T R . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00470 . . . C R ------ . T . . . C . . . . . T R . . . . . . . . S Y . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP00479 . . . C . ------ . T G . . C . . . . . T . . . . . . . Y . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BIA HGDP00981 . . . C R ------ . T S . . C . M . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP00985 . . . C R ------ . T S . . C . M . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01088 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01089 . . . C . ------ . T . . . C . . . . . T R . . . . . . . . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .BIA HGDP01091 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .BIA HGDP01094 . . . C . ------ . T . . . C . . . . . T R . . . . . . . . . Y . . G . R . . . . . . . . . . . K . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00904 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00905 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R . . . . . . . S . W . . R Y Y Y . . . . . R . S . . R S . W R R . .MAN HGDP00906 . . A C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00907 . . . C . ------ . T S . . C . M . . . T R . . . . . . Y . . Y . . G . R . . . . . . S . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00908 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00911 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP00912 . . . C . ------ . T S . . C . . . . . T . . . . . Y . Y . . . . . G . A . . . . . . . . . . . K . . T . . . T . T . . . A . A . C . . A . . A A . . .MAN HGDP00913 . . . C . ------ . T S . . C . . . . . T R R . . . . . Y . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00914 . . R C . ------ . T . . . C . . . . . T A . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP00915 . . . C R ------ . T S . . C . . . . . T . . . . . . . Y . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01199 . . . C . ------ . T S . . C . . . . . T R . . . . . . . . . Y . R G . . . . . . . . . . . . . . . . T . R . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01200 . . . C . ------ . T S . . C . . . . . T R . . . . . . Y . . Y . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01202 . . . C . ------ . T . . . C . . . . . T A . . . . . . . . . T . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .MAN HGDP01283 . . . C . ------ . T S . . C . . . . . T R . . . . . . . Y . Y . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .MAN HGDP01284 . . . C . ------ . T G . . C . . . . . T . . . . . Y . Y . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .MAN HGDP01286 . . . C G ------ . T . . . C . . . . . T . . . . . . . . . C . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . A . C . . A . . A A . . .SAN GM3043 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . Y R . A . C . . A . . A A . . .SAN JR013 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . R . . . . . . . . . . . . . . T . . . T . T . . . R . A . C . . A . . A A . . .SAN JR020 . . . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R . . . . . R . . . T . . R Y Y Y . . . . . R . S . . R S . W R R . .SAN JR054 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .SAN JR077 . Y . C R ------ . T . . . C . . . . . T . . . . . . . . . S . . . G . . . . . . R Y . . . R . . . . T . . . T . T . . . . . A R C . Y A . Y A A . . .SAN JR301 . . . C . ------ . T S . . C . M . . . T . . . W . . . . . S . . . G . R . . X . . . . . . . . K . . T . . . T . T . . Y R . A . C . . A . . A A . . .SAN JR305 . . . C . ------ . T . . . C . . . . . T . . . . . . . . . . . . R G . . . . . . R Y . . . R . . . . T R . . T . T . . . R . A R C . Y A . Y A A . . .SAN JR321 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . R . . . . R Y . . . R . . . . T . . . T . T . . . R . A R C . Y A . Y A A . . .SAN JR323 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .SAN JR354 . . . C . ------ . T S . . C . . . . . T . . . . . . . . . . . . . G . R . . . . R . . . . . R . . . T . . R Y Y Y . . . R . R . S . . R S . W R R . .PNG NG004 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG006 . . . Y . XXXXXX . Y S R S M Y . . R S Y . . R . . . R . . . . . . K S . . Y . . A Y . . K R . . . . . . . . . . . . . . . . A . C XX . A . . A A . . .PNG NG013 . . . . . . . . . A C . C . . G G . . . . . . . A . . . . . . . C . . . . . A . . . T . . . . . . . . . . . . . . . . . A . C -- . A . . A A . . .PNG NG014 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG015 . . . C . XXXXXX . Y S . . C . . . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG017 . . . C . . . . . . . C . . . . . T . . . . . . . . . . . . . G . . . T . . A T . . . A . . . . . . . . . . . . . . . . A . C . . A . . A A . . .PNG NG018 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . RPNG NG020 . . . Y . XXXXXX . Y S R S M Y M . R S Y . . . . . . R . . . . . . K S R . . . . R . . . K . . . . . W . . . Y . Y . . . R . A . C XX . A . . A A . . .PNG NG022 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . R . Y . R R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG025 . . . C . XXXXXX . Y S . . C . M . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG026 . . . Y . XXXXXX . Y S R S M Y . . R S Y . . . . . . R . . . . . R K S . . . . . R . . . K . . . . . W . . . Y . Y . . . R . A . C XX . A . . A A . . .PNG NG029 . . . C . XXXXXX . Y S . . C . . . . . T . . . . . . . . . . . . R G . . . Y . . R Y . . . R . . . . W . . . Y . Y . . . R . A . C . . A . . A A . . .PNG NG030 . . . . . . . . . A C . C . . G G . . . . . . . A . . . . . . . C . . . . . A . . . T . . . . . . . . . . . . . . . . . A . C -- . A . . A A . . .
Table S3. Polymorphism Table for five amplicons sequenced in the Diversity Panel a,b,c
111828752
111828852
111829011
111829013
111829473
111829492
111829550
111829579
111829639
111829676
111829698
111829736
111830504
111830510
111830749
111830853
111830929
111831400
111831465
111832329
111832673
111832690
111832718
111832895
111833010
111833037
111833044
111833131
111833232
111833233
111833253
111833304
111833318
111833482
111833544
111838514
111838673
111838767
111838813
111838936
111839151
111839308
111839658
111839753
111839925
111839966
111839984
111840068
111840237
111840318
111841149
111841283
111841305
111841318
111841363
111841391
111841413
111841458
111841576
111841585
111841592
111841599
111841620
111841650
111841792
111841825
111841877
111841948
111842046
111842071
111842091
111842150
111842166
Populationsample name C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - G G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G APNG NG034 . . . Y . XXXXXX . Y S R S M Y M M R S Y . . . . . . . . . . . . . G . A . . . R . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .PNG NG051 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . R . A . A . C . . A . . A A . . .HAN HGDP00774 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP00775 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00777 . . . C . ------ . T G . . C . A . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .HAN HGDP00778 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R Y R . S . . R . . A A . . .HAN HGDP00780 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . K . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00785 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00786 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .HAN HGDP00815 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . R R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP00819 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T S . . A . A . C . . A . . A A . . .HAN HGDP00977 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . Y T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP01288 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP01290 . . . C . ------ . T G . . C . . . . . T . . . . Y . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . A Y A . C . . A . . A A . . .HAN HGDP01293 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . . G . A . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .HAN HGDP01294 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A Y A . C . . A . . A A . . .HAN HGDP01295 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .HAN HGDP01296 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01357 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01358 . . . C . ------ . T G . . C . M . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01359 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . . G . . . . . . . . . . . . . . . . T . . . T . T . . . . . . . . . . . . . A A . . .BAS HGDP01360 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01361 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01362 . . . C . ------ . T G . . C . . . . . T . . . . . . . . T . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01364 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01370 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01371 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01372 . . . C . ------ M T S . . C . . . . . T . . . . . . . . . . . . R G . . . . . . . . . . . . . . . . T . . . T . T . . . R . R . S . . R . . A A . . .BAS HGDP01374 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01375 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01376 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01377 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . R G . R . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01378 . . . C . ------ . T G . . C . . . . . T . . . . . . . . . . . . A G . . . . . . . . . . . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .BAS HGDP01379 . . . C . ------ . T G . . C . . . . . T . . . . . . . . Y . . . A G . . . . . . . . . M . . . . . . T . . . T . T . . . A . A . C . . A . . A A . . .
Chimpanzee C T G T A GAGAGA C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - A G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G ABonobo C T G T A ------ C C C G G A T C C A C C G G A A T T G C C G C G G T G G G C - G G C G A G G G T C T T A G A C C C G G T G C G A G AT C G C C T G A G AGorilla C T G T A GA---- C C C G G A T C C A C C G G G A T T G C C G C G G T G G G C - A G C G A G G G T C T A A G A C C C G G T G C G A G AT C G C C T G A G A
a: positions in the 2006 built of the human genome are indicated above the inferred ancestral state. Five boxes distinguish the sites from the five ampliconsb Mutations were color coded according to their function: yellow (non-synonymous), brown (synonymous), green (3'-UTR), orange (splicing acceptor polymorphism), red (frameshift deletion), gray (intronic and intergenic)c: Derived alleles are written explicitly. IUPAC ambiguity codes are used for heterozygous sites. 'X' indicates heterozygote in-del
Table S4. Sequence variation in an extended re-sequencing for a selected group of individualsa
111820114
111820638
111821546
111821722
111822114
111822139
111822327
111822609
111823810
111824538
111824541
111825497
111826028
111826031
111826032
111826038
111826039
111826302
111826367
111826545
111826925
111826987
111827306
111828457
111828555
111829013
111829030
111829491
111829579
111829639
111829676
111829698
111829736
111830082
111830163
111830354
111830389
111830504
111830510
111830749
111830853
111830929
111831400
111831807
111832000
Population Individual T C A --- C T G CAC T AAA ----- G T A AATAT T AATTATTTTT G C A A A C A G T A GAGAGA C C G G A A T C C T C C A C C T G
Deep lineage Vanuatu MF82 G T . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG HGDP00555 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .PNG NG013 G . . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG NG030 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .PNG NG062 G . . . . G A --- . . . . A G ----- C ---------- . A . . R . . . . . . . . A C . . A . T C . . G G . G .
Shallow lineageOceanian PNG HGDP00556 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . A
PNG HGDP00550 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG04 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG14 G T . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . APNG NG17 G Y . TT- . . . . . . AAAAA . . . . . . . . . . . . . . C . . . . . . C . . T . . . . . . T . A
African Ghana Ghn32 n n n NNN n n n NNN n NNN NNNNN . . . . . . . . . T . . . . C G ------ T . . . C . . T . . . . . . T . ASan JR321 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . . A C . ------ T S . . C C . T . . . . . . T . A
Recombinant New Britain UV005 G T . . . G A --- . . . . A G ----- C ---------- . A . . . . . . . . . . . A C . . A . T C . . G G . G .PNG NG034 K . R XXX Y K R XXX Y XXX . R W R XXXXX Y XXXXXXXXXX . M . . . . . R Y . XXXXXX Y S R S M M W Y Y Y M M R S Y K R
Gene Converted Bougainville HGDP00824 n n n NNN n n n NNN n NNN NNNNN . W R XXXXX Y XXXXXXXXXX R M R . . . R R Y . . . . R S M . W Y Y Y . . R S Y K RBougainville HGDP00663 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . . R C . . . . . . C . . T . . . . . . T . ABougainville HGDP00788 n n n NNN n n n NNN n NNN NNNNN . . . . . . A . G . . . R A C . . . . . . C . . T . . . . . . T . ABougainville HGDP00789 n n n NNN n n n NNN n NNN NNNNN . . . . . . R . R . . . R R C . . . . . . C . . T . . . . . . T . AOutgroup Chimpanzee T C A T-- C T G CAC T AAA AA--- G T A AATAT T AATTATTTTT G C A A A C A G T A GAGAGA C C G G A A T C C T C C A C C T G
a: A single polymorphism in a CpG site at position 111838767 was inferred to have mutated independently in some human lineages and in chimpanzee
Table S4. Sequence variation in an extended re-sequencing for a selected group of individualsa
111832263
111833010
111833253
111833304
111833318
111833482
111834488
111834529
111835903
111836291
111836440
111836542
111836790
111836924
111836934
111837034
111837091
111837941
111838514
111838767
111838813
111838936
111839658
111839753
111840237
111840418
111840607
111840705
111841305
111841363
111841576
111841599
111841650
111841792
111841948
Population Individual AA G G T G G A A T G G T C T --- C C A C G G C G G A A C T C C G A AT C C
Deep lineage Vanuatu MF82 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG HGDP00555 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG013 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG030 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .PNG NG062 -- A . . C . . . . . A . T . . T . . . . A . T . . . . G . . . . -- . .
Shallow lineageOceanian PNG HGDP00556 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .
PNG HGDP00550 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG04 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG14 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .PNG NG17 . . . G . . . . . . . . . G TT- . . . T . A T . A . . A . . . . . . . .
African Ghana Ghn32 . . . G . . . . . . . . . G TT- . . T . . A T . A T . . . T T . G . T TSan JR321 . . . G . R . R K . . Y . K TTX . M W . . R Y . R T . . . T T R R . Y Y
Recombinant New Britain UV005 XX R R K S . R . K . R Y Y . XXX Y . . . . R . K . W . . K Y Y R . XX . .PNG NG034 XX . . G . A . . G R . C . . TTT . A . . R . . . . T . . . T T A . . . .
Gene Converted Bougainville HGDP00824 XX R R K S . . . K . R Y Y . XXX Y . . . . R . K . W R . K Y Y R . XX . .Bougainville HGDP00663 . . R G . . R . K . . Y . K TTX . . . Y . R Y . R W R M . Y Y R . . . .Bougainville HGDP00788 . . A G . . R . G . . C . . TTT . . . . . . . . . T R . . T T A . . . .Bougainville HGDP00789 . . R G . . . . K . . Y . K TTX . . . Y . R Y . R W R M . Y Y R . . . .Outgroup Chimpanzee AA G G T G G A A T C G T C T --- C C A C A G C G G A A C T C C G A AT C C
a: A single polymorphism in a CpG site at position 111838767 was inferred to have mutated independently in some human lineages and in chimpanzee
Table S5a. Derived frequency of the deep lineagePopulation frequency standard deviation/sample size a
Oceaniac Australia (West) 0.056 0.038Melanesian (Nasioi) 0.063 0.035Vanuatu 0.287 0.078New Britain 0.531 0.052Micronesia 0.126 0.045Papuan 0.227 0.034Tonga 0.091 0.063Samoas 0.059 0.041Tahiti 0.058 0.032
South East Asiab Eastern Indonesia 0.167 0.112Western Indonesia 0.000 7Laos 0.000 3Malay 0.000 4Philippines 0.000 25Vietnam 0.000 17
East Asiab Han 0.000 20Miao 0.000 38Yao 0.000 46Taiwan 0.000 56
South Asiab India 0.000 183Pakistan 0.014 0.010Sri Lanka 0.014 0.010
Europeb Basque 0.000 20
North East Africab Ethiopia 0.000 25
Sub-Saharan Africab Baka (Pygmy) 0.000 13Biaka (Pygmy) 0.000 27Gambia 0.000 32Ghana 0.000 85Ivory Coast 0.000 22Kenya (Bantu) 0.000 62KhoeSan 0.000 38Yoruba 0.000 117Mandenka 0.000 20South Africa (Bantu) 0.000 101Dinka 0.000 40Uganda 0.000 7Mbuti (Pygmy) 0.000 17Zimbabwe 0.000 47
a: When no individuals carry the lineage the number of individuals sampled is indicated in boldb: The absence of the deep lineage was inferred for individuals carrying the derived state at 111829579c Standard deviation in allele frequency estimated using method described in supplementary material.
Table S5b. Frequency of the shallow lineagePopulation frequency standard deviation/sample size a
Oceania Australia (West) 0.296 0.093Melanesian (Nasioi) 0.237 0.061Vanuatu 0.211 0.047New Britain 0.077 0.028Micronesia 0.105 0.044Papuan 0.383 0.041Tonga 0.045 0.045Samoas 0.034 0.034Tahiti 0.097 0.040
South East Asia Eastern Indonesia 0.167 0.112Western Indonesia 0.000 7Malay 0.000 4Philippines 0.000 12Vietnam 0.000 6
East Asia Han 0.000 20Yao 0.000 8Taiwan 0.000 11
South Asia India 0.000 159Pakistan 0.000 68Sri Lanka 0.000 64
Europe Basque 0.000 20
North East Africa Ethiopia 0.000 11
Sub-Saharan Africa Baka (Pygmy) 0.000 14Biaka (Pygmy) 0.019 0.019Gambia 0.000 44Ghana 0.023 0.013Ivory Coast 0.000 23Kenya (Bantu) 0.010 0.010KhoiSan 0.147 0.043Yoruba 0.011 0.006Mandenka 0.000 18South Africa (Bantu) 0.028 0.014Dinka 0.000 40Uganda 0.000 6Mbuti (Pygmy) 0.107 0.034Zimbabwe 0.042 0.021
a When no individuals carry the lineage the number of individuals sampled is indicated.
Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.
descriptor chromosome beginning end comment a
track name=Human_der description="Sites derived in the ancestry of all humans" color=128,128,0
chr12 111820069 111820070 20070,A->G,COVERED
chr12 111820113 111820114 20114,T->G,COVERED
chr12 111820514 111820515 20515,C->T,COVERED
chr12 111820591 111820592 20592,C->T,COVERED
chr12 111821056 111821057 21057,A->G,COVERED
chr12 111821063 111821064 21064,G->A,COVERED
chr12 111821069 111821070 21070,T->C,COVERED
chr12 111821219 111821220 21220,C->A,NONCOVERED
chr12 111821361 111821362 21362,A->C,COVERED
chr12 111821412 111821413 21413,A->C,COVERED
chr12 111821843 111821844 21844,T->C,COVERED
chr12 111822135 111822136 22136,T->G,COVERED
chr12 111822356 111822357 22357,C->T,NONCOVERED
chr12 111822385 111822386 22386,G->A,NONCOVERED
chr12 111822410 111822411 22411,C->G,NONCOVERED
chr12 111822685 111822686 22686,G->T,COVERED
chr12 111822914 111822915 22915,T->A,COVERED
chr12 111822935 111822936 22936,T->C,COVERED
chr12 111823001 111823002 23002,C->T,COVERED
chr12 111823084 111823085 23085,A->G,COVERED
chr12 111823633 111823634 23634,C->T,COVERED
chr12 111823791 111823792 23792,C->T,COVERED
chr12 111823901 111823902 23902,C->G,NONCOVERED
chr12 111823922 111823923 23923,T->G,COVERED
chr12 111824003 111824004 24004,A->G,COVERED
chr12 111824136 111824137 24137,G->A,NONCOVERED
chr12 111824226 111824227 24227,C->T,NONCOVERED
chr12 111825034 111825035 25035,T->C,NONCOVERED
chr12 111825373 111825374 25374,T->A,COVERED
chr12 111825450 111825451 25451,A->G,COVERED
chr12 111825520 111825521 25521,C->G,NONCOVERED
chr12 111825899 111825900 25900,A->G,COVERED
chr12 111826139 111826140 26140,A->G,NONCOVERED
chr12 111826394 111826395 26395,A->T,COVERED
chr12 111826432 111826433 26433,A->G,COVERED
chr12 111826612 111826613 26613,C->T,COVERED
chr12 111826654 111826655 26655,T->C,COVERED
chr12 111826687 111826688 26688,A->G,COVERED
chr12 111826759 111826760 26760,G->A,COVERED
chr12 111827083 111827084 27084,C->T,COVERED
chr12 111827738 111827739 27739,G->A,COVERED
chr12 111827951 111827952 27952,G->C,COVERED
chr12 111829768 111829769 29769,C->T,COVERED
chr12 111829812 111829813 29813,G->A,COVERED
chr12 111830244 111830245 30245,C->T,COVERED
chr12 111831916 111831917 31917,G->A,NONCOVERED
chr12 111832010 111832011 32011,A->G,COVERED
chr12 111832700 111832701 32701,T->C,COVERED
chr12 111833489 111833490 33490,C->G,COVERED
chr12 111833734 111833735 33735,C->T,NONCOVERED
chr12 111834147 111834148 34148,C->T,NONCOVERED
Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.
descriptor chromosome beginning end comment a
chr12 111834492 111834493 34493,A->C,NONCOVERED
chr12 111834736 111834737 34737,C->T,COVERED
chr12 111835438 111835439 35439,A->G,NONCOVERED
chr12 111836290 111836291 36291,C->G,COVERED
chr12 111837077 111837078 37078,C->A,NONCOVERED
chr12 111837410 111837411 37411,G->A,NONCOVERED
chr12 111837733 111837734 37734,G->A,COVERED
chr12 111838279 111838280 38280,A->G,COVERED
chr12 111838292 111838293 38293,A->C,COVERED
chr12 111838334 111838335 38335,A->C,COVERED
chr12 111838766 111838767 38767,A->G,COVERED
chr12 111839317 111839318 39318,T->G,NONCOVERED
chr12 111839369 111839370 39370,C->T,NONCOVERED
chr12 111840145 111840146 40146,A->G,NONCOVERED
chr12 111840204 111840205 40205,T->G,NONCOVERED
chr12 111840236 111840237 40237,A->T,NONCOVERED
chr12 111840398 111840399 40399,T->G,COVERED
chr12 111840578 111840579 40579,T->C,NONCOVERED
chr12 111840622 111840623 40623,A->C,NONCOVERED
chr12 111840675 111840676 40676,T->A,NONCOVERED
chr12 111841218 111841219 41219,A->G,COVERED
chr12 111841452 111841453 41453,T->C,COVERED
chr12 111841585 111841586 41586,G->A,COVERED
chr12 111842045 111842046 42046,T->A,NONCOVERED
chr12 111842070 111842071 42071,G->A,NONCOVERED
track name=Deep_vs_ref description="Differences between the deep and reference sequences" color=128,0,128
chr12 111822138 111822139 22139,T->G
chr12 111822326 111822327 22327,G->A
chr12 111822608 111822611 22609,CAC_del
chr12 111825980 111825981 25981,C->T
chr12 111826027 111826028 26028,T->A
chr12 111826030 111826048 26031,large_assymetric_substitution
chr12 111826366 111826367 26367,C->A
chr12 111826980 111826981 26981,G->T
chr12 111827757 111827758 27758,G->C
chr12 111827945 111827946 27946,G->A
chr12 111828528 111828529 28529,G->C
chr12 111828554 111828555 28555,G->A
chr12 111829012 111829013 29013,T->C
chr12 111829491 111829491 29491,GAGAGA_in-del
chr12 111829549 111829550 29550,C->A
chr12 111829578 111829579 29579,C->T
chr12 111829675 111829676 29676,G->A
chr12 111829697 111829698 29698,G->C
chr12 111829735 111829736 29736,A->C
chr12 111830162 111830163 30163,T->A
chr12 111830353 111830354 30354,C->T
chr12 111830388 111830389 30389,C->T
chr12 111830503 111830504 30504,T->C
chr12 111830852 111830853 30853,A->G
chr12 111830928 111830929 30929,C->G
Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.
descriptor chromosome beginning end comment a
chr12 111831399 111831400 31400,C->T
chr12 111831806 111831807 31807,T->G
chr12 111831999 111832000 32000,G->A
chr12 111832262 111832264 32263,deletion
chr12 111833009 111833010 33010,G->A
chr12 111833303 111833304 33304,T->G
chr12 111833317 111833318 33318,G->C
chr12 111835178 111835179 35179,T->G
chr12 111836439 111836440 36440,G->A
chr12 111836789 111836790 36790,C->T
chr12 111837033 111837034 37034,C->T,N
chr12 111838812 111838813 38813,G->A
chr12 111839657 111839658 39658,G->T
chr12 111840236 111840237 40237,uncertain_ancestral_state
chr12 111840704 111840705 40705,T->G
chr12 111841304 111841305 41305,C->T
chr12 111841362 111841363 41363,C->T
chr12 111841591 111841592 41592,G->A
chr12 111841619 111841620 41620,G->C
chr12 111841649 111841651 41650,AT_deletion_frameshit
chr12 111841824 111841825 41825,G->A
track name=Deep_anc description="Sites ancestral in the deep lineage and exclusive to Papuans" color=255,0,255
chr12 111828554 111828555 28555,G->A,shared_with_Afr,D
chr12 111829012 111829013 29013,T->C,D
chr12 111829491 111829491 29491,in-del_GAGAGA,proposed_gene_conv,D
chr12 111829578 111829579 29579,proposed_gene_conv,D
chr12 111829735 111829736 29736,A->C,D
chr12 111830353 111830354 30354,C->T,D
chr12 111831399 111831400 31400,C->T,D
chr12 111831999 111832000 32000,G->A,D
chr12 111833303 111833304 33304,T->G,D
chr12 111841304 111841305 41305,C->T,D
chr12 111841362 111841363 41363,C->T,D
track name=Deep_der description="Sites derived in deep lineage" color=0,0,255
chr12 111822138 111822139 22139,T->G,N
chr12 111822326 111822327 22327,G->A,N
chr12 111822608 111822611 22609,CAC_del,N
chr12 111826027 111826028 26028,T->A,NONCOVERED
chr12 111826030 111826048 26031,large_assymetric_substitution,NONCOVERED
chr12 111826366 111826367 26367,C->A,N
chr12 111829675 111829676 29676,G->A,D
chr12 111829697 111829698 29698,G->C,D
chr12 111830162 111830163 30163,T->A,D
chr12 111830388 111830389 30389,C->T,D
chr12 111830503 111830504 30504,T->C,N
chr12 111830852 111830853 30853,A->G,D
chr12 111830928 111830929 30929,C->G,D
chr12 111831806 111831807 31807,T->G,D
chr12 111832262 111832264 32263,deletion,D
chr12 111833009 111833010 33010,G->A,NONCOVERED
Table S6. Polymorphism information relevant for comparison of the reference sequence, deep, and shallow lineages in BED format.
descriptor chromosome beginning end comment a
chr12 111833317 111833318 33318,G->C,D
chr12 111836439 111836440 36440,G->A,N
chr12 111836789 111836790 36790,C->T,D
chr12 111837033 111837034 37034,C->T,N
chr12 111838812 111838813 38813,G->A,D
chr12 111839657 111839658 39658,G->T,D
chr12 111840236 111840237 40237,A<->T,D
chr12 111840704 111840705 40705,T->G,NONCOVERED
chr12 111841649 111841651 41650,AT_deletion_frameshit,N
track name=Shallow_der description="Polymorphisms derived in the shallow lineage (Papuan or African)" color=255,255,0
chr12 111826924 111826925 26925,A->T,AFRICAN
chr12 111829029 111829030 29030,A->G,AFRICAN
chr12 111836923 111836924 36924,T->G,BOTH
chr12 111836923 111836933 36933,homopolymer,BOTH
chr12 111837940 111837941 37941,A->T,AFRICAN
chr12 111838513 111838514 38514,C->T,PAPUAN
chr12 111838812 111838813 38813,G->A,D,BOTH
chr12 111838935 111838936 38936,C->T,BOTH
chr12 111839752 111839753 39753,G->A,BOTH
chr12 111840607 111840607 40607,C->A,PAPUAN
chr12 111841598 111841599 41599,A->G,AFRICAN
chr12 111841791 111841792 41792,C->T,AFRICAN
chr12 111841947 111841948 41948,C->T,AFRICAN
track name=Ref_anc description="Sites ancestral in the Reference sequence and derived in all Papuans" color=255,0,0
chr12 111841591 111841592 41592,G->A
chr12 111841619 111841620 41620,G->C
chr12 111841824 111841825 41825,G->A
track name=Ref_der description="Polymorphic sites derived in the Reference, but ancestral in Africans and Papuans of this study" color=255,0,0
chr12 111825980 111825981 25981,C->T
chr12 111826980 111826981 26981,G->T
chr12 111827757 111827758 27758,G->C
chr12 111827945 111827946 27946,G->A
chr12 111828528 111828529 28529,G->C
chr12 111829549 111829550 29550,C->A
chr12 111835178 111835179 35179,T->Ga: Polymorphisms with no coverage in Denisova are indicated as 'NONCOVERED'. D indicates sharing and N indicates non-sharing between the deep lineage and Denisova.
Table S7a. Parameters for the demographic modelsa
model N0 t0 N1 t1 N2 Ma 250 200 250 100 250 2000a 500 200 500 100 500 2000a 1000 200 1000 100 1000 2000a 2000 200 2000 100 2000 2000a 4000 200 4000 100 4000 2000
b 250 0 50 1800 50 2000b 250 0 100 1800 100 2000b 250 0 200 1800 200 2000b 500 0 50 1800 50 2000b 500 0 100 1800 100 2000b 500 0 200 1800 200 2000b 1000 0 50 1800 50 2000b 1000 0 100 1800 100 2000b 1000 0 200 1800 200 2000b 2000 0 50 1800 50 2000b 2000 0 100 1800 100 2000b 2000 0 200 1800 200 2000b 4000 0 50 1800 50 2000b 4000 0 100 1800 100 2000b 4000 0 200 1800 200 2000
c 250 0 50 1800 1000 2000c 250 0 100 1800 1000 2000c 250 0 200 1800 1000 2000c 500 0 50 1800 1000 2000c 500 0 100 1800 1000 2000c 500 0 200 1800 1000 2000c 1000 0 50 1800 1000 2000c 1000 0 100 1800 1000 2000c 1000 0 200 1800 1000 2000c 2000 0 50 1800 1000 2000c 2000 0 100 1800 1000 2000c 2000 0 200 1800 1000 2000c 4000 0 50 1800 1000 2000c 4000 0 100 1800 1000 2000c 4000 0 200 1800 1000 2000
d 250 0 1 0 1 2000d 500 0 1 0 1 2000d 1000 0 1 0 1 2000d 2000 0 1 0 1 2000d 4000 0 1 0 1 2000
e 250 200 50 0 50 2000e 250 200 100 0 100 2000e 250 200 200 0 200 2000e 500 200 50 0 50 2000e 500 200 100 0 100 2000e 500 200 200 0 200 2000e 1000 200 50 0 50 2000e 1000 200 100 0 100 2000e 1000 200 200 0 200 2000
Table S7a. Parameters for the demographic modelsa
model N0 t0 N1 t1 N2 Me 2000 200 50 0 50 2000e 2000 200 100 0 100 2000e 2000 200 200 0 200 2000e 4000 200 50 0 50 2000e 4000 200 100 0 100 2000e 4000 200 200 0 200 2000e 250 600 50 0 50 2000e 250 600 100 0 100 2000e 250 600 200 0 200 2000e 500 600 50 0 50 2000e 500 600 100 0 100 2000e 500 600 200 0 200 2000e 1000 600 50 0 50 2000e 1000 600 100 0 100 2000e 1000 600 200 0 200 2000e 2000 600 50 0 50 2000e 2000 600 100 0 100 2000e 2000 600 200 0 200 2000e 4000 600 50 0 50 2000e 4000 600 100 0 100 2000e 4000 600 200 0 200 2000e 250 1000 50 0 50 2000e 250 1000 100 0 100 2000e 250 1000 200 0 200 2000e 500 1000 50 0 50 2000e 500 1000 100 0 100 2000e 500 1000 200 0 200 2000e 1000 1000 50 0 50 2000e 1000 1000 100 0 100 2000e 1000 1000 200 0 200 2000e 2000 1000 50 0 50 2000e 2000 1000 100 0 100 2000e 2000 1000 200 0 200 2000e 4000 1000 50 0 50 2000e 4000 1000 100 0 100 2000e 4000 1000 200 0 200 2000e 250 1400 50 0 50 2000e 250 1400 100 0 100 2000e 250 1400 200 0 200 2000e 500 1400 50 0 50 2000e 500 1400 100 0 100 2000e 500 1400 200 0 200 2000e 1000 1400 50 0 50 2000e 1000 1400 100 0 100 2000e 1000 1400 200 0 200 2000e 2000 1400 50 0 50 2000e 2000 1400 100 0 100 2000e 2000 1400 200 0 200 2000e 4000 1400 50 0 50 2000e 4000 1400 100 0 100 2000e 4000 1400 200 0 200 2000a: parameters as in Figure S5
Table S7b. Estimates of TMRCA for 10 chromosomes in the deep and shallow lineagesa,b
lineage model median 0.025 quantile 0.975 quantiledeep a 28684 7991 118588
b 18905 7632 43314c 18915 7703 42497d 19055 7200 33379e 32714 8605 68557all 23767 8055 65659
shallow a 21962 6330 91013b 17407 6600 38249c 17433 6609 38247d 16338 5932 31983e 24823 6720 63367all 19287 6571 58326
a: times are expressed in yearsb: model parameters as in Table S7a and Figure S5
Table S8. Summary statistics for the five amplicons of OAS1
first amplicon second amplicon third amplicon
Population na S!w (%) "w (%) D private
Private non-
singletons S!w (%) "w (%) D private
Private non-
singletons
Biaka 30 3 0.066 0.068 0.08 1 0 2 0.045 0.055 0.45 0 0
Mandenka 32 3 0.065 0.071 0.21 1 1 2 0.044 0.051 0.31 0 0
San 20 3 0.074 0.070 -0.15 1 0 1 0.025 0.017 -0.62 0 0
PNG 30 6 0.132 0.228 2.08 5 5 6 0.135 0.174 0.84 5 4
Han 32 0 0.000 0.000 N.D.b 0 0 1 0.022 0.028 0.40 0 0
French Basque 32 2 0.043 0.011 -1.52 1 0 1 0.022 0.006 -1.16 0 0a number of chromosomes in the sampleb not defined
Table S8. Summary statistics for the five amplicons of OAS1
third amplicon fourth amplicon fifth amplicon
S!w (%) "w (%) D private
Private non-
singletons S!w (%) "w (%) D private
Private non-
singletons S!w (%) "w (%) D private
Private non-
singletons
5 0.088 0.117 0.90 0 0 1 0.012 0.006 -0.78 0 0 5 0.102 0.062 -1.08 1 0
8 0.139 0.133 -0.12 2 1 5 0.061 0.015 -2.03 4 0 13 0.260 0.097 -2.05 0 0
4 0.079 0.077 -0.07 1 0 6 0.083 0.064 -0.75 1 1 16 0.366 0.288 -0.80 2 1
6 0.106 0.127 0.59 4 3 7 0.086 0.142 1.92 3 3 5 0.102 0.130 0.76 3 0
5 0.087 0.064 -0.70 3 0 1 0.012 0.003 -1.16 1 0 6 0.120 0.143 0.54 2 1
3 0.052 0.077 1.12 0 0 1 0.012 0.003 -1.16 1 0 4 0.080 0.101 0.65 0 0
Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573 (hg18) in samples of the HGDPPopulation Sample 111829579 111831807b 111839753b
Nasioi HGDP00490 1 0 0Nasioi HGDP00491 2 0 0Nasioi HGDP00655 2 0 1Nasioi HGDP00656 1 1 0Nasioi HGDP00657 2 0 0Nasioi HGDP00658 1 0 1Nasioi HGDP00660 0 1 1Nasioi HGDP00661 1 0 0Nasioi HGDP00662 2 0 1Nasioi HGDP00663 0 0 1Nasioi HGDP00664 1 0 1Nasioi HGDP00787 1 0 0Nasioi HGDP00788 0 0 0Nasioi HGDP00789 0 0 1Nasioi HGDP00823 1 0 1Nasioi HGDP00824 0 1 0Nasioi HGDP00825 2 0 0Nasioi HGDP00978 1 0 1Nasioi HGDP01027 1 0 1
PNG HGDP01081 2 0 0PNG HGDP00540 2 0 1PNG HGDP00541 1 0 1PNG HGDP00542 2 0 0PNG HGDP00543 2 0 1PNG HGDP00544 2 0 0PNG HGDP00545 1 0 2PNG HGDP00546 1 0 1PNG HGDP00547 0 1 1PNG HGDP00548 1 0 1PNG HGDP00549 1 1 0PNG HGDP00550 0 0 2PNG HGDP00551 1 0 1PNG HGDP00552 1 1 0PNG HGDP00553 1 1 1PNG HGDP00554 1 1 1PNG HGDP00555 0 2 0PNG HGDP00556 0 0 2
Biaka HGDP00470 2 0Biaka HGDP00461 2Biaka HGDP00464 2Biaka HGDP00466 2Biaka HGDP00477 2Biaka HGDP00451 2 0Biaka HGDP00452 2 0Biaka HGDP00454 2 0Biaka HGDP00455 2 0Biaka HGDP00457 2 0Biaka HGDP00458 2 0
Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573 (hg18) in samples of the HGDPPopulation Sample 111829579 111831807b 111839753b
Biaka HGDP00459 2 0Biaka HGDP00460 2 0Biaka HGDP00479 2 0Biaka HGDP00981 2 0Biaka HGDP00985 2 0Biaka HGDP01088 2 0Biaka HGDP01089 2 0Biaka HGDP01091 2 0Biaka HGDP01094 2 0
Mandenka HGDP00904 2 0Mandenka HGDP00905 2 0Mandenka HGDP00906 2 0Mandenka HGDP00907 2 0Mandenka HGDP00908 2 0Mandenka HGDP00909 2 0 0Mandenka HGDP00910 2 0 0Mandenka HGDP00911 2 0Mandenka HGDP00912 2 0Mandenka HGDP00913 2 0Mandenka HGDP00914 2 0Mandenka HGDP00915 2 0Mandenka HGDP01199 2 0Mandenka HGDP01200 2 0Mandenka HGDP01202 2 0Mandenka HGDP01283 2 0Mandenka HGDP01284 2 0Mandenka HGDP01286 2 0Mandenka HGDP00919 2Mandenka HGDP01285 2
San HGDP00992 2 0 1San HGDP00987 2 0 0San HGDP00988 2 0 0San HGDP00991 2 0 0San HGDP01029 2 0 0San HGDP01032 2 0 0San HGDP01036 2 0 0
Mbuti HGDP00476 2 2Mbuti HGDP00449 2 1Mbuti HGDP00456 2 0Mbuti HGDP00462 2 0Mbuti HGDP00463 2 0Mbuti HGDP00471 2 0Mbuti HGDP00474 2 0Mbuti HGDP00478 2 0Mbuti HGDP00982 2 0Mbuti HGDP00984 2 0
French Basque HGDP01357 2 0
Table S9. Genotypesa for SNPs at positions 111828579, 111831807 and 111839573Population Sample 111829579 111831807b 111839753b
French Basque HGDP01358 2 0French Basque HGDP01359 2 0French Basque HGDP01360 2 0French Basque HGDP01361 2 0French Basque HGDP01362 2 0French Basque HGDP01363 2 0 0French Basque HGDP01364 2 0French Basque HGDP01365 2 0 0French Basque HGDP01367 2 0 0French Basque HGDP01369 2 0 0French Basque HGDP01370 2 0French Basque HGDP01371 2 0French Basque HGDP01372 2 0French Basque HGDP01374 2 0French Basque HGDP01375 2 0French Basque HGDP01376 2 0French Basque HGDP01377 2 0French Basque HGDP01378 2 0French Basque HGDP01379 2 0
Han HGDP00775 2 0Han HGDP00777 2 0Han HGDP00778 2 0Han HGDP00815 2 0Han HGDP01288 2 0Han HGDP01289 2 0 0Han HGDP01290 2 0Han HGDP01292 2 0 0Han HGDP01293 2 0Han HGDP01294 2 0Han HGDP01295 2 0Han HGDP01296 2 0Han HGDP00774 2 0Han HGDP00780 2 0Han HGDP00785 2 0Han HGDP00786 2 0Han HGDP00819 2 0Han HGDP00821 2 0 0Han HGDP00822 2 0 0Han HGDP00977 2 0a Homozygote ancestral are represented with 0, heterozygotes with 1 and homozygote derived with 2b Unknown genotypes were left blank
Supplementary Figures
Figure S1. Levels of genetic variation in Papuans at the set of the first three amplicons and 61
intergenic loci. Values of per base nucleotide diversity a) uncorrected and b) corrected by
divergence with chimpanzee. Ratio between values of nucleotide diversity in Papuans and in the
other sequenced populations: c) Biaka, d) Mandenka, e) San, f) Han, g) Basque.
Figure S2. Median -joining network of phased haplotypes for the fourth amplicons and the first
620 bases of the fifth amplicon. Branch lengths are proportional to the number of polymorphisms
distinguishing the haplotypes and the area of the circles is proportional to the observed
occurrence of the haplotypes.
Figure S3. Geographic distribution of the shallow lineage in a) Melanesian and b) Africa. The
fraction of chromosomes carrying the allele is indicated in black.
Figure S4. Contour plot of P-values for the maintenance of linkage disequilibrium for the 6 kb in
28001 to 34000 as a function of time since recombination events are detectable and
recombination rate. The level lines indicate the P-values. The larger dot corresponds to the point
estimate of divergence time and recombination rate (3.3 Mya and 0.0168, respectively), and the
smaller dot to conservative estimates (1.7 Mya and 0.0084, respectively) for those parameters.
Figure S5. Models for the size of the population of chromosomes of either the deep or shallow
lineages. In each case, the width represents the number of chromosomes in the lineage and time
is represented in the vertical direction. The bottom corresponds to the present and the top to the
past. The models include: a) constant population size, b) exponential growth from constant
population size, c) population crash followed by exponential growth, d) exponential growth from
a single chromosome, and e) initially as in b), but the population of chromosomes has reached a
stable size. All the models are nested in f), which contains all the parameters used table S7a.
Figure S1
Figure S2
Figure S3
Figure S4
Figure S5
top related