40 genomas bombyx mori jun wang

38
www.sciencemag.org/cgi/content/full/1176620/DC1 Supporting Online Material for Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx) Qingyou Xia, Yiran Guo, Ze Zhang, Dong Li, Zhaoling Xuan, Zhuo Li, Fangyin Dai, Yingrui Li, Daojun Cheng, Ruiqiang Li, Tingcai Cheng, Tao Jiang, Celine Becquet, Xun Xu, Chun Liu, Xingfu Zha, Wei Fan, Ying Lin, Yihong Shen, Lan Jiang, Jeffrey Jensen, Ines Hellmann, Si Tang, Ping Zhao, Hanfu Xu, Chang Yu, Guojie Zhang, Jun Li, Jianjun Cao, Shiping Liu, Ningjia He, Yan Zhou, Hui Liu, Jing Zhao, Chen Ye, Zhouhe Du, Guoqing Pan, Aichun Zhao, Haojing Shao, Wei Zeng, Ping Wu, Chunfeng Li, Minhui Pan, Jingjing Li, Xuyang Yin, Dawei Li, Juan Wang, Huisong Zheng, Wen Wang, Xiuqing Zhang, Songgang Li, Huanming Yang, Cheng Lu, Rasmus Nielsen, Zeyang Zhou, Jian Wang, Zhonghuai Xiang,* Jun Wang* *To whom correspondence should be addressed. E-mail: [email protected] (Z.X.); [email protected] (J.W.) Published 27 August 2009 on Science Express DOI: 10.1126/science.1176620 This PDF file includes: Materials and Methods SOM Text Figs. S1 to S7 Tables S1 to S10 References

Upload: luis-chandi

Post on 09-Mar-2015

35 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 40 Genomas Bombyx Mori Jun Wang

www.sciencemag.org/cgi/content/full/1176620/DC1

Supporting Online Material for

Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx)

Qingyou Xia, Yiran Guo, Ze Zhang, Dong Li, Zhaoling Xuan, Zhuo Li, Fangyin Dai, Yingrui Li, Daojun Cheng, Ruiqiang Li, Tingcai Cheng, Tao Jiang, Celine Becquet, Xun Xu, Chun Liu, Xingfu Zha, Wei Fan, Ying Lin, Yihong Shen, Lan Jiang, Jeffrey Jensen, Ines Hellmann, Si Tang, Ping Zhao, Hanfu Xu, Chang Yu, Guojie Zhang, Jun Li, Jianjun

Cao, Shiping Liu, Ningjia He, Yan Zhou, Hui Liu, Jing Zhao, Chen Ye, Zhouhe Du, Guoqing Pan, Aichun Zhao, Haojing Shao, Wei Zeng, Ping Wu, Chunfeng Li, Minhui Pan,

Jingjing Li, Xuyang Yin, Dawei Li, Juan Wang, Huisong Zheng, Wen Wang, Xiuqing Zhang, Songgang Li, Huanming Yang, Cheng Lu, Rasmus Nielsen, Zeyang Zhou, Jian

Wang, Zhonghuai Xiang,* Jun Wang*

*To whom correspondence should be addressed. E-mail: [email protected] (Z.X.);

[email protected] (J.W.)

Published 27 August 2009 on Science Express

DOI: 10.1126/science.1176620

This PDF file includes:

Materials and Methods SOM Text Figs. S1 to S7 Tables S1 to S10 References

Page 2: 40 Genomas Bombyx Mori Jun Wang

Materials and Methods Sample collection

In order to include major silkworm systems kept in the laboratories worldwide, we collected strains from diverse geographic regions, such as China, Japan, Europe and tropical areas (mostly southeast Asian: India, Cambodia and Laos), as well as silkworms from the mutant system. All 29 domesticated samples listed in Table S1 are from the Institute of Sericulture and Systems Biology in Southwest University of China. Two important developmental characteristics, voltinism (number of generations per year) and moltinism (number of larval molts per generation), and sex were recorded for each of those 29 domesticated silkworms. Of these, 18 are monovoltine, 8 are bivoltine and others are polyvoltine. We also captured 11 wild silkworms from mulberry fields in China, facilitating the comparative analysis between domesticated and wild groups.

An advantage of the domesticated silkworm over other lepidopteran species is that many mutations (morphological, biochemical, and behavioral mutations) and many inbred geographic strains (e.g., Chinese, Japanese, Korean, European, Tropical strains) are available and represent important resources for studying artificial selection and silkworm domestication. It is likely that farmers first moved wild silkworms from field to house so that they could be reared to produce silk in a predator free environment. Then high silk production traits and easy handling may have evolved by artificial selection. Sequentially, they were brought by human to different countries in the world through commercial trade. Finally, the domesticated silkworm underwent long-term rearing and breeding by local farmers, forming geographically different varieties with specific characteristics (such as voltinism and moltinism) affected by local climate. Currently, these geographic varieties are maintained in different stock centers and preserved by close inbreeding within each variety. Library construction and sequencing

Genomic DNA was extracted from silkworm pupae and moths using a standard protocol for genomic DNA extraction. We only sequenced a single individual for each variety of both domesticated and wild silkworms. The manufacturer’s instructions (S1) were followed to prepare libraries. We used the workflow, as described (S2), to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking, and denaturization and hybridization of the sequencing primers. Then we applied a base-calling pipeline (SolexaPipeline-0.3) (S1) to detect sequences from the raw fluorescent images. Public data used

The silkworm reference genome sequence and annotation information were downloaded from the Silkworm Genome Database (S3, S4). We reconstructed silkworm chromosomes by joining genomic scaffolds with 500 bp N’s, according to their mapping relationship (S5). Unmapped scaffolds were joined by 500 bp N’s to form a chromosome UN. Because the insert size of paired-end (PE) libraries is less than 500 bp, we used gaps of 500 bp N’s to make the chromosomes for convenience of analysis. Additional sequences containing complete CDS were retrieved from NCBI as of Feb. 8th, 2009. We then made a non-redundant annotation file by comparing those two datasets. Microarray-base gene expression data for the domesticated silkworm, whose genome is taken as reference, was downloaded from BmMDB (S6). Reads mapping

We used SOAP v1.09 (S7) to map raw single-end (SE) and PE reads onto the finished silkworm reference genome (S5). Reads were classified into three categories, “uniquely aligned” (those with unique alignment positions), “repeatedly aligned” (those that can be mapped to multiple genomic locations with the same least base differences; only one randomly chosen

Page 3: 40 Genomas Bombyx Mori Jun Wang

chromosome position was reported) and “unaligned” reads. The same trimming strategy, as described (S2), was applied when dealing with mismatches. PCR duplications were removed by a PERL script which discards pairs with identical outer coordinates to improve the accuracy. SNP calling

A four step procedure was utilized to detect SNPs. (1) We used SOAPSNP (S8) to calculate the likelihood of each individual’s genotypes. (2) We integrated all the individual likelihood files together to produce a pseudo-genome for each site in the total sample of 40 genomes by maximum likelihood estimation (MLE). Sites passing criteria according to copy number, sequencing depth, quality score and minor allele count, were kept for the following rank sum test adjustment. SNPs passed the rank sum test (S2) (P>= 0.005) were fixed as members of the high quality (HQ) SNP set. (3) For domesticated strains as a whole, another pseudo-genome for domesticated group was made without filtering. Polymorphic positions overlapped with HQ SNPs were retained as SNPs for the domesticated silkworms. We took a similar process for the wild ones and obtained a SNP set of them. (4) We allocated base types back to each individual based on genotypes of HQ SNPs and each individual likelihood file. The genotype with the largest likelihood was directly chosen as the consensus genotype in each individual. Short indel detection

A three-step approach was used to call indels. (1) For each individual, we conducted a second run of SOAP, allowing for gaps. Individual indel sets were obtained by a pipeline developed before (S2). (2) For each genomic position supported by at least one individual indel set, reads from all the samples were considered to pass the filtering criteria with number of supporting reads. The resulting indels was termed high quality indels. (3) Assigned indels back to each individual. From the high quality indel set, we picked indel sites in each individual with at least one supporting read. Experimental validation of SNPs and indels

We used the Sequenom Genotyping Platform (S9) to validate HQ SNPs picked randomly according to their characteristics mentioned in the section “SNP calling”. We genotyped 4,840 sites in 121 SNP positions across all the 40 silkworms and confirmed that 117 were polymorphic.

As a pilot phase for indel validation, we randomly selected 10 high quality indel positions and found these indels 69 times in the 40 samples. Then we performed PCR-Sanger dideoxy sequencing using AB 3730XL at those sites. After manually checking all the intensity trace files we found all the polymorphic positions were confirmed by the PCR-sequencing result. Detection of structural variations (SV)

A three-step strategy was used to detect the SVs for 36 silkworms with PE sequencing reads. (1) SVs were called individually, as described (S2), and regions of at least 2 supporting abnormal read pairs were retained for the second step. (2) We treated all the PE reads from 36 silkworms as from a single individual and maintained for the next step potential candidates with (a) at least 10 abnormally mapped supporting pairs and (b) at least 2 qualified individuals each with at least 2 supporting pairs. The resulting SVs were termed as high quality SVs. (3) We assigned high quality SVs back to each individual. Individual SVs with 80% of its length overlapping with any high quality SV were reported. Calculation of Linkage Disequilibrium (LD)

To measure LD level in the silkworm population, we calculated correlation coefficient (r2) of alleles after setting -maxdistance 200 -dprime -minGeno 0.6 -minMAF 0.1 -hwcutoff 0.001

Page 4: 40 Genomas Bombyx Mori Jun Wang

by the software Haploview (S10). Then curves were plotted with R scripts which draw averaged r2 against pairwise marker distances. Domestication associated site (da-SNP, da-indel) detection

Genomic polymorphic sites where at least 28 domesticated strains and at least 10 wild ones have unique reads, corresponding to a minimal concordance rate of 95%, were chosen to enter the χ2 test for domestication association. Then a Bonferroni corrected P value of 2.96×10-8 and 1.87×10-6 was used to screen out significant da-SNPs and da-indels, respectively. Construction of silkworm phylogeny

Individual SNPs generated after step (4) of the SNP calling section were used to calculate distances between silkworms. The p-distance between two individuals i and j is defined to be

( )

1

1 Ll

ij ijl

D dL =

= ∑ ,

where L is the length of regions where HQ SNPs can be identified, and given the alleles at position l are A/C, then

( )

0, if genotypes of the two individuals are and ,0.5, if genotypes of the two individuals are and ,0.5, if genotypes of the two individuals are and ,1, if genotypes of the two ind

lijd =

AA AA

AA AC

AC AC

ividuals are and .

⎧⎪⎪⎨⎪⎪⎩ AA CC

Then a neighbor-joining method was used to construct the phylogenetic tree on the basis of the distance matrix calculated by the software PHYLIP 3.68 (S11). Bootstrap values were calculated in 1,000 replicates. PCA analysis

Following the procedure of (S12), we considered only autosomal data with n=40 individuals, and ignoring sites with more than two alleles or missing data (S=14,056,247 SNPs). The genotype of individual i at SNP k was transformed to dik=0, 1 or 2 if individual i is homozygous for the reference allele, heterozygous, or homozygous for the non-reference allele, respectively. M is an n×S matrix containing the normalized genotypes: dik’=(dik-E(dk))/

( ) (1- ( ) / 2) / 2k kE d E d× , where E(dk) is the mean of dk. An n×n matrix of the sample covariance of the individuals was calculated by X=MMT/S. The eigenvector decomposition of X was performed using the R function eigen and the significance of the eigenvectors was determined with a Tracey-Widom test implemented in the program twstats provided with the EIGENSOFT software (S12). We obtained the latitude and longitude of the capital of a province or country of origins with Google Earth program (for Europe we took the center define by Google Earth). Correlations between phenotypes and eigenvalues were tested with Kendall’s τ statistics (S13). Population structure inference

First, ped files were created as input for PLINK (S14, S15) with parameters --ped ped_file --recode12 --geno 0.5 --map output_map.

Then the program frappe (S16, S17) was utilized to infer population structure and ancestry information of the silkworms. The analysis was based on 13,066,429 SNP sites and we did not assume any prior information about their ancestry. We run 10,000 iterations and pre-defined the number of cluster, K, from 2 to 9.

Page 5: 40 Genomas Bombyx Mori Jun Wang

Population history model In order to understand the impact of the initial domestication event on observed levels of

variation, we fit a simple bottleneck model to the data. The following parameters are assumed: domestication occurred 5,000 years ago, there is one generation per year, and there was a stepwise reduction in variation at the time of domestication. We here estimate both the severity of the population reduction, and the rate of population growth subsequent to that event.

Two criteria are used to fit a bottleneck model. First, we use the empirically observed level of reduction, determined by the observation that the domesticated strains harbor ~83% of the variation observed in the wilds (with a ratio of 0.015/0.018). Second, we fit the estimated demographic model to the observed site frequency spectrum. In order to fit a model to both the observed level of reduction and the frequency spectrum, we take a simulation approach. Using the program ms (S18), a grid of parameter values were simulated, varying from a population size reduction at the time of domestication from 1% to 99%, and an exponential rate of growth ranging from no increase in population size, to a 1000-fold increase from the time of domestication to the present. Identification of Genomic Regions of Selective Signals (GROSS)

A sliding window approach was applied to quantify the polymorphism levels (θπ, pairwise nucleotide variation as a measure of variability) (S19), selection statistics (Tajima’s D, a measure of selection in the genome) (S20) and genetic differentiation between domesticated and wild populations (Fst) (S21). Our analysis was performed for 5 Kb windows sliding in 500bp steps and SNPs for each population were from subsection (3) in the “SNP calling”. We developed a series of PERL scripts that consider genotype frequencies in the two groups and calculate values of θπ, Tajima’s D for both groups, Fst between the two populations following the formulas for those statistics (S19-21).

Then we considered the distribution of PiR (defined to be the ratio of θπ,domesticated to θπ,wild), and the distribution of TDD (Tajima’s D for domesticated silkworms). We used an empirical procedure and selected windows with significantly low PiR and significantly low TDD values (Z test, P<0.005 for both; Fig. 2A) as candidates of selection signals along the genome. Neighboring windows were joined where possible, forming larger regions (GROSS). Microarray analysis for genes in GROSS

The microarray data of these genes in GROSS came from the Bombyx mori microarray database (S6). Hierarchical clustering of the data was performed with the program Cluster (S22), and the cluster data were visualized using the program TreeView (S22).

Page 6: 40 Genomas Bombyx Mori Jun Wang

Supporting Text Data production

We performed whole-genome resequencing for each silkworm varieties using the Illumina Genome Analyzer II (GA II) and produced 1.50 billion short reads (averaging 42 bp in length), which corresponds to 63.25 Gb raw data. In total, we obtained a 118.1 X effective depth for all 40 varieties, with an average depth of 3X for each variety (Table S2A). The mean genome coverage for domesticated and wild silkworms was 82.0% and 83.0%, respectively, and the mean gene region coverage was 91.8% and 94.2%, respectively. Mapping results for domesticated and wild strains are summarized in Table S2B. We observed ~5% higher of bases mapped for domesticated silkworms than for wild ones and ~0.6% lower mismatch rate for the domesticated strains, both of which can be due to the high genetic diversity between reference genome and wild strains. However a higher average sequencing depth for the wild ones compensate this difference and the resulting genome/gene region coverage are comparable between the two groups. Variation detection

Making full use of the massive number of short reads provided by next-generation sequencing technology, the approach we took in this report can effectively cover around 80% of each individual’s genome at a depth of 3X for the 432MB sequence. Guided by a “pool to individual” strategy (see Materials and Methods for details), we can detect high quality SNPs. Of the identified SNPs, 3,504,749 (21.9%) were within genes (introns and exons) and 422,815 (2.64%) were in the coding sequences (CDS) (Table S3A). We estimated that the ratio of synonymous to non-synonymous changes in the CDS was 2.91:1.

We can also identify short indels (1-3 bp) as well as structural variations in a similar way. It would be difficult to confidently detect individual genomic variants at such a depth per individual, unless the population-level information of 118.1X coverage was taken into account. We found that only 1,433 (0.46%) of the indels are in the CDS, and 1,014 of these would cause a frameshift affecting 866 genes (Table S4A). For structural variation detection, we found a mean length of 560 bp, and genomic deletions comprise 98.8% of them, which can be explained by the limitation of short insert size. Mutation

We calculated mutation rates for SNPs in different functional categories. We found that, for every functional class, the value for wild varieties is higher than for the domesticated ones (Table S3B). This observation is from calculating the estimate of the population mutation rate θS (S23), which corrects for sample size in the two groups (29 domesticated vs. 11 wild). Accordingly, we also noticed a higher θS value (Mann Whitney U, P=7.69×10-6) for indels in wild silkworms compared to domesticated ones (Table S4B). In comparison to Gallus gallus, for which this information is available (S24), silkworms have a two fold higher level of θS at CDS, intron and genome-wide levels (Table S3B).

We also estimated θπ values for SNPs in B. mori and found they are 0.0061, 00136 and 0.0136 for CDS, intronic regions and whole genome, respectively. θπ values in B. mandarina are 0.0070, 0.0157 and 0.0153 for these three categories, respectively. Compared with Drosophila simulans (S25), all of these data are at a lower polymorphism level. Linkage disequilibrium (LD) pattern

We assessed the linkage disequilibrium (LD) levels in the silkworm domesticated and wild varieties by calculating the pairwise LD measure r2 (S26, see Materials and Methods) and present curves representing LD decay with increasing genomic distance between SNP pairs (Fig. S1). We find that LD decays rapidly in silkworms, with r2 decreased to half of its maximum at a distance of around 46 bp and 7 bp for the domesticated and wild varieties, respectively. The faster

Page 7: 40 Genomas Bombyx Mori Jun Wang

decay of LD in B. mori as compared to the decay of LD measurement in D. melanogaster [which also decreases rapidly to half of its maximum value at about several hundreds bp (S27)] is likely due to a higher recombination rate of 2.97 cM/Mb (S28) in the silkworm genome as compared to 1.59 cM/Mb (S29) for the fruitfly, as well as to high effective population sizes. The relatively slower decay of LD in the domesticated strains is most likely caused by inbreeding within each strain, although population structure, reduced effective population size, and a possible increased rate of positive selection may also have contributed. These results show that association mapping combining multiple domesticated strains is possible but can be confounded by the extensive population structure and inbreeding. By contrast, association mapping based on wild individuals will be difficult due to low levels of LD.

As sample size is an important parameter influencing LD patterns, we randomly selected 11 domesticated silkworms to perform this analysis to adjust the sample size. For chromosome 2, we repeated the analyses for three independent sets of 11 randomly selected domesticated silkworms and found similar results. Demography of silkworms

In the PCA analysis, there is a significant correlation with voltinism for the first four principle components in the domesticated varieties. Moltinism (number of larval molts per generation) also correlates with eigenvector 1 and 3. We observed a significant correlation between latitude of the sample origins and eigenvectors 2 and 4 (Kendall’s τ, P=0.03 and 0.04, respectively) (Table S7), and a lack of connection between longitudes and any of the principle components. These key traits relating to silkworm biology and yield are defining genetically distinct subgroups, suggesting that genetic mapping of these traits may be complicated by the general genetic differentiation between strains with different molting and voltine values. Mapping studies may benefit from using varieties with large differences in the relevant moltism and voltinism traits, but with otherwise little genetic differentiation. After fitting the demographic model (see Materials and Methods), we observed that a 90% reduction in population in the domesticated variety could account for the observed levels of variability (Fig. S2). The surprisingly high levels of variability in the domesticated variety suggest that a large amount of individuals were used in the initial domestication event. An alternative hypothesis is substantial gene-flow between the wild and domesticated varieties after domestication, but the very clear differentiation between domesticated and wild varieties suggests that gene-flow from the wild to domesticated varieties may not have been strong. The distinct separation of strains does show that the genetic variation in the domestic strains has been maintained despite local inbreeding. It is commonly assumed that domestication leads to a significant reduction in variability (S30) because the domesticated species might have arisen from a geographically limited group of individuals and thus subjected to a bottleneck in population size during domestication, and they have been subjected to strong artificial selection subsequent to the domestication event. In many domesticated species [e.g., rice (S31) or wheat (S32)] the domesticated species contains much less variability at the nucleotide level than the corresponding wild species. We did not, however, find that these factors have been sufficiently strong enough in the silkworm to lead to extensive loss of genetic variability.

We also inferred population ancestry with frappe (S16) and no ancestral information was assumed before the calculation. For K=2, the results show a clear domesticated/wild split (Fig. S3). This is consistent with the phylogeny and PCA results derived from our data. When K = 3, a new component including D5, D7, D15, D16 and D24 was separated from the entire domesticated group, also consistent with the same subgroup in the phylogenetic tree. From K = 3 to 4, another sub group emerged including D17-D23, D27 and D28, which clustered together in the phylogeny. When K = 5, the two Japanese high silk production strains stand out as a new group. At K = 6 or above, additional clusters came out as outlier populations which disturb previous organization of the population structure and make little biological sense.

Page 8: 40 Genomas Bombyx Mori Jun Wang

Details of GROSS To determine if certain SNPs were more common in the domesticated strains, we adopted a complex trait association study methodology (S33). We treated domesticated and wild individuals as phenotypically distinct and conducted a series of association tests for each qualified SNP (Materials and Methods). In total, we found that 1,347 of the polymorphic sites were significantly different (Chi square; P<2.96×10-8) in their association with domesticated versus wild varieties (termed domesticated associated SNPs, or da-SNPs), and that 410 (30.4%) of these lie within 298 genes (Table S8).

Looking at the domesticated vs. wild variety association of the indels, we found that 34 indel sites were significantly different (Chi square; P<1.87×10-6) in their association with domesticated versus wild varieties (termed domesticated associated indels, or da-indels). We found that more than 45% of all the da-SNPs are located in GROSS; this indicates that da-SNPs, which may be in the initial stages of becoming SNPs fixed in the domesticated group, are enriched in GROSS compared to genomic background.

We found 212 GROSS contain only one gene, which means that approximately 60% (212/354) of all the genes (Table S9) found to be potentially important to domestication are unique to a GROSS (Table S10). This indicates that most GROSS genes were probably under selection by themselves, and had little chance to have experienced hitchhiking. Genes likely important for domestication are found in GROSS

In addition to GROSS genes enriched in silk gland, we also found midgut- and testis- enriched genes. While the former is related to metabolism of carbohydrates, amino acids and lipids, which play an important role in food digestion and nutrient absorption, the latter is annotated as having binding, catalytic, and motor activity related to reproduction.

Among 32 midgut-enriched genes, nine participate in the dietary protein digestion (serine protease), carbohydrate metabolism (malate dehydrogenase and pyruvate dehydrogenase), substance transporting (organic cation transporter, sodium- and chloride- dependent glycine transporter 2, ATP-binding cassette transporter, and zinc transporter 5), and lipid metabolism [fatty acid binding protein (FABP) and scavenger receptor]. The malate dehydrogenase gene in B. mori shares 57% amino acid sequence similarity with its homolog in Escherichia coli, in which the mutant results in decreased activity of its encoded enzyme (S34). FABP is mainly involved in the binding and transport of unsaturated fatty acids, such as linolenic and linoleic acids, both of which are essential to silkworm and, like in other animals such as human (S35), can only be absorbed through food uptake (mulberry leaves for the silkworms). Artificial diet-based nutrition research has confirmed that there is a 60% of weight loss in silkworms fed on food without those two unsaturated fatty acids, compared to the ones in the control group (S36). The identification of these genes involved in energy metabolism indicates that the energy metabolism process has been under artificial selection in the process of silkworm domestication.

Among 54 testis-enriched genes, five genes are involved in spermatogenesis: permidine synthase, sperm protein SSP411, t-complex-associated testis expressed 1, intersex, and shaggy. In addition, three genes are related to sperm motility: myosin class II heavy chain, outer dense fiber of sperm tails protein 2, and axonemal dynein intermediate chain inner arm i1. These results provide evidence for possible selective pressure on B. mori reproduction during the domestication process. Additional notes

Genome-wide single base-pair level genetic variation maps have only been generated for species with small genomes, including yeast (S37), Salmonella (S38), Plasmodium falciparum (S39), and human rhinovirus (S40). For larger genomes, no comprehensive single-base resolution maps are currently available, although high-density SNP maps have been built for human (S41)

Page 9: 40 Genomas Bombyx Mori Jun Wang

and mouse (S42), and moderate-density ones for chicken (S24), dog (S43), sheep (S44), and cattle (S45). Our strategy here provides a nearly complete genome level variation map, which gives more reliable information on genetic polymorphisms in a population.

There are two sub-populations of B. mandarina, Chinese wild silkworms (from China, each with 28 chromosomes, the same as B. mori) and Japanese wild silkworms [from Japan, each with 27 chromosomes (S46)], and a common viewpoint of silkworm domestication (S47) states that the domesticated silkworms were tamed from the Chinese wild ones. Although this statement is the basis of our effort presented in this paper, mitochondrial results took advantage of these 40 samples and public data of the Japanese wild silkworm (NCBI Accession Number: NC_003395) does support compelling evidence of this argument (Li et al., personal communication).

B. mori is not only well adapted to human handling, but is wholly dependent on humans for survival, in addition it is well-differentiated trait-wise from its wild cousin. Of equal importance, this event took place in a different geographical region (Asia vs. the Fertile Crescent) (S48) and in a distinctly different culture from the earliest known domestication events. These aspects make silkworm domestication a unique event in agricultural history, deserving the same kind of attention as the domestication of livestock and crop plants. We directly tested for selection related specifically to domestication by comparing variability in domesticated versus wild, and sorting out genomic regions with significant difference in polymorphism density between those two groups (e.g., Fig. S7). Although others are in the pipeline, it is unprecedented to have such a source of near-relatives in this clade for comparative genome analysis which can be aimed not only at identifying genes associated with domestication in the candidate GROSS we detected, but also for annotating and defining regulatory regions which can complement our knowledge about functional elements in the silkworm genome.

Page 10: 40 Genomas Bombyx Mori Jun Wang

Supporting Figures

Page 11: 40 Genomas Bombyx Mori Jun Wang

Fig. S1

Fig. S1. Linkage disequilibrium (LD) patterns. LD measured by r2 decays with pairwise marker distance suggesting a bottleneck at the time of domestication. The inset shows details of this trend for the first 100 bp. The maximum of r2 for domesticated and wild varieties, at the pairwise distance of 1 bp, are 0.829 and 0.733, respectively. When LD drops to half of the maximal levels, on average, SNP positions are 46 bp (r2

domesticated=0.412) and 7 bp (r2wild=0.348) apart for the

domesticated and wild varieties, respectively.

Page 12: 40 Genomas Bombyx Mori Jun Wang

Fig. S2

Fig. S2. A bottleneck model estimation to illustrate silkworm domestication. Simulations showed that a 90% reduction in domesticated population size could account for the maintenance a ~83% variation of the wild varieties.

Page 13: 40 Genomas Bombyx Mori Jun Wang

Fig. S3

Fig. S3. Population structure for the 40 silkworms. Number of ancestral populations, K, are set from 2 to 5 (top to bottom).

Page 14: 40 Genomas Bombyx Mori Jun Wang

Fig. S4

Fig. S4. WEGO result: functional annotation for genes in GROSS.

Page 15: 40 Genomas Bombyx Mori Jun Wang

Fig. S5

Fig. S5. A two-way hierarchical cluster analysis of the expression patterns of 159 GROSS genes in different Dazao tissues. Microarray signals for different tissue types (columns) and genes (rows) are shown, with continuous expression levels from dark green (lowest) to bright red (highest). A/MSG: anterior/middle silk gland; PSG: posterior silk gland.

Page 16: 40 Genomas Bombyx Mori Jun Wang

Fig. S6

Fig. S6. Comparison of the relative expression of bHLH genes in the silk gland of fifth larval-instar of the reference B. mori strain and a high silk production strain. The relative expression of bHLH genes was assessed by quantitative real-time polymerase chain reaction (qRT-PCR) analysis. BmActin gene was used as internal control and the highest relative quantities were set to 1. We found that bHLH is up-regulated four fold in the higher silk production strain compared to the reference strain on day 3 of the fifth larval instar.

Page 17: 40 Genomas Bombyx Mori Jun Wang

Fig. S7

Fig. S7. An example GROSS containing only one gene Sgf-1 which is important to silk production. Density of polymorphism (θπ), test statistics for selection (Tajima’s D), diversity between two populations (Fst), and genome annotation are shown (from top to bottom). Both θπ and Tajima’s D for the domesticated and wild varieties are shown in red and green, respectively.

Page 18: 40 Genomas Bombyx Mori Jun Wang

Supporting Tables

Page 19: 40 Genomas Bombyx Mori Jun Wang

Table S1. Silkworm samples and detailed traits. Voltinism characterizes generation per year and moltinism denotes the number of larval molts per life cycle. (*: “V1” represents monovoltine, “V2” bivoltine and “V3” polyvoltine. #: “M2” represents bimoulting, “M3” trimoulting, “M4” tetramoulting and “M5” pentamoulting.)

Sample ID Strain name Sex Voltinism* and moltinism# System or location Other traits and comments Latitude Longitude

D01 J7532 Male V2M4 Japan High silk production, hybrid strain 35.69 139.69 D02 J04-010 Female V1M4 Japan - 35.69 139.69 D03 J872 Unknown V2M4 Japan High silk production, hybrid strain 35.69 139.69 D04 J106 Male V2M4 Japan - 35.69 139.69 D05 N4 Female V2M4 Japan - 35.69 139.69 D06 Cambodia Male V3M4 Cambodia - 11.54 104.90 D07 LaoⅡ Female V3M4 Laos - 17.97 102.61 D08 India M3 Male V2M3 India - 28.64 77.23 D09 Europe18 Female V1M4 Europe - 54.53 15.26 D10 Italy16 Female V1M4 Italy, Europe - 41.87 12.57 D11 Soviet Union No.1 Female V1M4 Former SU, Europe - 55.76 37.62 D12 15-010 Unknown V1M5 Mutation - NA NA D13 02-210 Female V1M4 Mutation - NA NA D14 15-001 Male V3M3 Mutation - NA NA D15 Mutation M2 Unknown V2M2 Mutation - NA NA D16 A06E Unknown V2M4 Guangdong province, China - 23.12 113.26 D17 Damao Unknown V1M3 Sichuan province, China - 30.66 104.08 D18 Ankang No.4 Male V1M3 Shanxi province, China - 34.26 108.95 D19 ZT500 Female V1M3 Gansu province, China - 36.07 103.75 D20 Zhugui Female V1M4 Zhejiang province, China - 30.27 120.15 D21 Bilian Female V1M4 Jiangsu province, China - 32.05 118.77 D22 ZT900 Female V1M3 Sichuan province, China - 30.66 104.08 D23 ZT100 Female V1M3 Hunan province, China - 28.20 112.98 D24 Sihong15 Male V1M4 Jiangsu province, China - 32.05 118.77 D25 Xiaoshiwan Female V1M4 Zhejiang province, China - 30.27 120.15 D26 C108 Female V2M4 Chongqing, China - 29.55 106.55 D27 Sichuang M3 Female V1M3 Sichuan province, China 30.66 104.08 D28 Qiansanmian Male V1M3 Guizhou province, China - 26.59 106.73 D29 Handan Male V1M4 Hebei province, China - 38.03 114.48 W01 B. mandarina Ziyang Unknown Unknown Sichuan province, China - 30.66 104.08 W02 B. mandarina Nanchong Unknown Unknown Sichuan province, China - 30.66 104.08 W03 B. mandarina Hongya Unknown Unknown Sichuan province, China - 30.66 104.08 W04 B. mandarina Pengshan Unknown Unknown Sichuan province, China - 30.66 104.08 W05 B. mandarina Ankang Unknown Unknown Shanxi province, China - 37.87 112.57 W06 B. mandarina Yichang Unknown Unknown Hubei province, China - 30.57 114.29 W07 B. mandarina Yancheng Unknown Unknown Jiangsu province, China - 32.05 118.77 W08 B. mandarina Luzhou Unknown Unknown Sichuan province, China - 30.66 104.08 W09 B. mandarina Hunan Unknown Unknown Hunan province, China - 28.20 112.98 W10 B. mandarina Suzhou Unknown Unknown Jiangsu province, China - 32.05 118.77 W11 B. mandarina Rongchang Unknown Unknown Chongqing, China - 29.55 106.55

Page 20: 40 Genomas Bombyx Mori Jun Wang

Table S2. Data production. (A) Sequencing summary.

Samples Yield (Gigabase) % Genome coverage Average effective depth (X) Effective Depth (X)

29 Domesticated strains 44.2 99.85 2.9 83.8

11 Wild strains 19.1 99.52 3.1 34.3

Total 63.3 99.88 3.0 118.1

(B) Mapping summary based on SOAP 1.09 results. Statistics Domesticated (mean±sd) Wild (mean±sd)% Bases mapped 82.12±4.18 77.59±2.36% Bases mapped uniquely 75.72±2.43 70.80±3.54% With difference 1.88±0.33 2.51±0.35% Genome coverage 82.02±3.65 83.03±4.43% Gene region coverage 91.83±3.71 94.16±3.55

Page 21: 40 Genomas Bombyx Mori Jun Wang

Table S3. SNP summary. (A) SNP numbers in different functional elements.

Total SNPs Domesticated group Wild group

Synonymous 314,639 263,930 276,530

Stop codon 594 535 490

Premature stop codon 1,658 1,449 1,375 CDS Non-

synonymous Other 105,924 90,856 85,397

Splice Sites 1,432 1,290 1,131

Gene region

Intron Other 3,084,186 2,689,566 2,557,517

miRNA 42 37 34

rRNA 76 67 59 ncRNA

tRNA 233 206 184

Transposable elements 3,801,067 3,374,986 3,120,087

Whole genome 15,986,559 14,023,573 13,237,865

(B) Mutation rate (θ) for SNPs (×10-2).

Total SNPs Domesticated group Wild group

CDS 0.53 0.48 0.62 Gene region

Intron 1.20 1.12 1.36

miRNA 0.32 0.31 0.36

rRNA 1.69 1.60 1.78 ncRNA

tRNA 0.45 0.43 0.49

Transposable elements 1.39 1.32 1.55

Whole genome 1.15 1.08 1.30

Page 22: 40 Genomas Bombyx Mori Jun Wang

Table S4. Indel summary. (A) Indel numbers in different functional elements.

Total indels Domesticated group Wild group

frameshift 1,014 953 872CDS

non-frameshift 419 370 334Gene region

Intron 65,936 59,051 53,112

miRNA 4 3 4

rRNA 1 1 0ncRNA

tRNA 6 6 5

Transposable elements 85,259 77,871 63,107

Whole genome 311,608 281,185 251,453

(B) Mutation rate (θ) for indels (×10-4).

Total indels Domesticated group Wild group

CDS 0.17 0.17 0.19 Gene region

Intron 1.72 1.65 1.88

miRNA 1.67 1.34 2.27

rRNA 0.15 0.16 0.00 ncRNA

tRNA 0.35 0.37 0.40

Transposable elements 1.01 0.98 1.01

Whole genome 1.54 1.48 1.68

Page 23: 40 Genomas Bombyx Mori Jun Wang

Table S5. Structural variations (SV) summary. Overlapping with TEs

Total sites No. % in total sites

Duplication 327 28 9Deletion 34677 26663 77Insertion 80 21 26Other complex SVs 9 0 0Total 35093 26712 76

Page 24: 40 Genomas Bombyx Mori Jun Wang

Table S6. Tracy-Widom (TW) statistics and p-values for the six first eigenvalues. The significant p-values are in bold.

Number Eigenvalues TW p-values

1 8.009 21.05 7.33E-30

2 3.93 1.421 0.02627

3 3.885 2.819 0.002409

4 3.697 2.271 0.006531

5 3.246 -3.375 0.9661

6 3.168 -3.76 0.9859

Page 25: 40 Genomas Bombyx Mori Jun Wang

Table S7. Kendall’s τ statistics (p-values) of the correlations between phenotypes and eigen-vectors. The significant p-values are in bold.

Eigenvectors Wild vs domesticated Voltinism # Molts Latitudes Longitudes 1 -0.640 (1.36E-06) 0.446 (0.003) 0.327 (0.032) 0.084 (0.483) 0.213 (0.076)

2 0.259 (0.051) 0.533 (4.45E-04) 0.051 (0.741) -0.259 (0.031) 0.094 (0.433) 3 0.162 (0.220) 0.393 (0.010) 0.401 (0.009) 0.120 (0.316) 0.018 (0.880) 4 0.347 (0.009) 0.546 (3.19E-04) 0.152 (0.321) -0.246 (0.041) 0.028 (0.815)

Page 26: 40 Genomas Bombyx Mori Jun Wang

Table S8. The domestication associated (da) SNPs and indels Whole genome # in GROSS % in GROSS

Gene number 14,470 354 2.45%

CDS region 120 51 42.5%Gene region

Sub-total 410 198 48.3%

TE region 231 103 44.6%da-SNP

Total 1,347 617 45.8%

CDS region 1 1 100.0%Gene region Sub-total 5 2 40.0%

TE region 12 3 25.0%da-indel

Total 34 12 35.3%

Page 27: 40 Genomas Bombyx Mori Jun Wang

Table S9. Genes found in GROSS. Gene ID Scaffold Start End Strand GROSS ID

BGIBMGA002068 nscaf2210 92099 92326 - SWGROSS0002

BGIBMGA002041 nscaf2210 1264029 1265086 - SWGROSS0005

BGIBMGA002089 nscaf2210 1269704 1270372 + SWGROSS0005

BGIBMGA002015 nscaf2210 3050656 3054860 - SWGROSS0008

BGIBMGA000616 nscaf1690 519948 520533 + SWGROSS0025

BGIBMGA000590 nscaf1690 1025643 1027490 - SWGROSS0030

BGIBMGA000588 nscaf1690 1088029 1091916 - SWGROSS0035

BGIBMGA000582 nscaf1690 1325859 1329432 - SWGROSS0040

BGIBMGA000630 nscaf1690 1451333 1453695 + SWGROSS0044

BGIBMGA000634 nscaf1690 1723950 1729697 + SWGROSS0046

BGIBMGA000644 nscaf1690 2802592 2804303 + SWGROSS0056

BGIBMGA000649 nscaf1690 2947882 2948292 + SWGROSS0061

BGIBMGA000655 nscaf1690 3171591 3176144 + SWGROSS0064

BGIBMGA000678 nscaf1690 4126607 4131294 + SWGROSS0067

BGIBMGA000543 nscaf1690 4284455 4284811 - SWGROSS0075

BGIBMGA000681 nscaf1690 4308301 4309950 + SWGROSS0076

BGIBMGA000688 nscaf1690 4635598 4643996 + SWGROSS0078

BGIBMGA000699 nscaf1690 5115125 5131329 + SWGROSS0080

BGIBMGA000700 nscaf1690 5142220 5143295 + SWGROSS0081

BGIBMGA000521 nscaf1690 5467593 5474024 - SWGROSS0086

BGIBMGA013330 nscaf3068 216549 217556 + SWGROSS0099

BGIBMGA012283 nscaf3040 927204 930161 - SWGROSS0101

BGIBMGA012277 nscaf3040 1297856 1300111 - SWGROSS0107

BGIBMGA012262 nscaf3040 1875559 1878041 - SWGROSS0110

BGIBMGA012328 nscaf3040 1989681 1994671 + SWGROSS0111

BGIBMGA012253 nscaf3040 2339503 2339977 - SWGROSS0113

BGIBMGA012354 nscaf3040 3712802 3716591 + SWGROSS0121

BGIBMGA006633 nscaf2855 2906740 2908938 - SWGROSS0130

BGIBMGA006714 nscaf2855 3287598 3298729 + SWGROSS0132

BGIBMGA006611 nscaf2855 4118078 4118311 - SWGROSS0135

BGIBMGA006733 nscaf2855 4122115 4123658 + SWGROSS0135

BGIBMGA006609 nscaf2855 4129479 4160859 - SWGROSS0135

BGIBMGA006608 nscaf2855 4175032 4176630 - SWGROSS0136

BGIBMGA006861 nscaf2859 938908 944863 + SWGROSS0142

BGIBMGA006862 nscaf2859 946507 946662 + SWGROSS0142

BGIBMGA006870 nscaf2859 1236319 1240989 + SWGROSS0143

BGIBMGA006883 nscaf2859 1644217 1645271 + SWGROSS0147

BGIBMGA006917 nscaf2860 582388 582579 - SWGROSS0152

BGIBMGA006905 nscaf2860 1912760 1918814 - SWGROSS0156

BGIBMGA006956 nscaf2860 2242429 2248402 + SWGROSS0160

Page 28: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA002904 nscaf2575 2096721 2102318 + SWGROSS0165

BGIBMGA001791 nscaf2176 2370370 2372851 + SWGROSS0167

BGIBMGA001792 nscaf2176 2399280 2399705 + SWGROSS0169

BGIBMGA001613 nscaf2176 3893564 3899643 - SWGROSS0170

BGIBMGA001826 nscaf2176 3905459 3906227 + SWGROSS0170

BGIBMGA011963 nscaf3032 412007 413842 + SWGROSS0175

BGIBMGA011789 nscaf3031 1482091 1483992 + SWGROSS0183

BGIBMGA012068 nscaf3034 828516 837320 - SWGROSS0197

BGIBMGA014358 scaffold316 381740 383507 - SWGROSS0215

BGIBMGA010447 nscaf2993 1921700 1922150 - SWGROSS0218

BGIBMGA010530 nscaf2993 3073511 3079145 + SWGROSS0222

BGIBMGA010531 nscaf2993 3085807 3092577 + SWGROSS0222

BGIBMGA010366 nscaf2993 7885518 7894362 - SWGROSS0234

BGIBMGA010585 nscaf2993 7899699 7911006 + SWGROSS0235

BGIBMGA010659 nscaf2998 523733 525679 + SWGROSS0249

BGIBMGA010622 nscaf2998 887531 890438 - SWGROSS0253

BGIBMGA010602 nscaf2998 1592883 1606361 - SWGROSS0258

BGIBMGA005897 nscaf2842 1372074 1374771 + SWGROSS0263

BGIBMGA005849 nscaf2839 94626 95364 - SWGROSS0264

BGIBMGA005846 nscaf2839 120437 123051 - SWGROSS0265

BGIBMGA005845 nscaf2839 123579 128616 - SWGROSS0266

BGIBMGA005860 nscaf2839 138076 139012 + SWGROSS0267

BGIBMGA001080 nscaf1898 98247 98654 - SWGROSS0270

BGIBMGA001078 nscaf1898 134459 135294 - SWGROSS0272

BGIBMGA001085 nscaf1898 181315 197148 + SWGROSS0275

BGIBMGA001086 nscaf1898 210046 210724 + SWGROSS0276

BGIBMGA000972 nscaf1898 7685637 7687901 - SWGROSS0290

BGIBMGA001188 nscaf1898 8498417 8509123 + SWGROSS0294

BGIBMGA000953 nscaf1898 8655094 8655273 - SWGROSS0295

BGIBMGA001213 nscaf1898 9580865 9583162 + SWGROSS0297

BGIBMGA001286 nscaf1898 13822734 13830264 + SWGROSS0302

BGIBMGA000839 nscaf1898 15736619 15750274 - SWGROSS0308

BGIBMGA009486 nscaf2953 512395 515181 + SWGROSS0311

BGIBMGA009463 nscaf2953 1352157 1354673 - SWGROSS0313

BGIBMGA007789 nscaf2888 841705 848249 + SWGROSS0324

BGIBMGA007741 nscaf2888 862144 862530 - SWGROSS0325

BGIBMGA007737 nscaf2888 958825 959052 - SWGROSS0328

BGIBMGA007791 nscaf2888 970587 982244 + SWGROSS0329

BGIBMGA007792 nscaf2888 984948 994967 + SWGROSS0329

BGIBMGA007736 nscaf2888 998735 999185 - SWGROSS0329

BGIBMGA007793 nscaf2888 999476 1002597 + SWGROSS0329

Page 29: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA007849 nscaf2888 3516420 3517553 + SWGROSS0341

BGIBMGA007704 nscaf2888 3518629 3518988 - SWGROSS0341

BGIBMGA007863 nscaf2888 3917256 3921794 + SWGROSS0342

BGIBMGA007699 nscaf2888 4326103 4326579 - SWGROSS0346

BGIBMGA007876 nscaf2888 4328954 4339266 + SWGROSS0346

BGIBMGA007698 nscaf2888 4348687 4348911 - SWGROSS0346

BGIBMGA007877 nscaf2888 4350120 4353948 + SWGROSS0346

BGIBMGA007697 nscaf2888 4376566 4377372 - SWGROSS0348

BGIBMGA007696 nscaf2888 4416895 4417656 - SWGROSS0349

BGIBMGA007882 nscaf2888 4525756 4527943 + SWGROSS0351

BGIBMGA007883 nscaf2888 4607888 4608715 + SWGROSS0354

BGIBMGA007498 nscaf2887 1527342 1529161 - SWGROSS0362

BGIBMGA007497 nscaf2887 1536409 1551387 - SWGROSS0362

BGIBMGA007587 nscaf2887 1553956 1561252 + SWGROSS0362

BGIBMGA003320 nscaf2655 2615446 2623537 - SWGROSS0369

BGIBMGA003306 nscaf2655 3224043 3224327 - SWGROSS0371

BGIBMGA002165 nscaf2216 1847585 1848325 - SWGROSS0372

BGIBMGA002164 nscaf2216 1880672 1881072 - SWGROSS0373

BGIBMGA002163 nscaf2216 1892393 1895808 - SWGROSS0374

BGIBMGA002189 nscaf2216 1987667 1995877 + SWGROSS0375

BGIBMGA002190 nscaf2216 1999801 2001268 + SWGROSS0375

BGIBMGA002191 nscaf2216 2005344 2005520 + SWGROSS0375

BGIBMGA002159 nscaf2216 2009367 2020300 - SWGROSS0375

BGIBMGA002192 nscaf2216 2023151 2023396 + SWGROSS0375

BGIBMGA002158 nscaf2216 2093850 2094494 - SWGROSS0377

BGIBMGA002157 nscaf2216 2094861 2096909 - SWGROSS0377

BGIBMGA002193 nscaf2216 2118245 2118403 + SWGROSS0378

BGIBMGA002195 nscaf2216 2172998 2174578 + SWGROSS0380

BGIBMGA002196 nscaf2216 2175146 2175379 + SWGROSS0380

BGIBMGA002197 nscaf2216 2178396 2188564 + SWGROSS0380

BGIBMGA012984 nscaf3058 3531502 3534730 + SWGROSS0386

BGIBMGA012985 nscaf3058 3551699 3552496 + SWGROSS0386

BGIBMGA012850 nscaf3058 3553591 3554842 - SWGROSS0386

BGIBMGA013039 nscaf3058 5904882 5906264 + SWGROSS0389

BGIBMGA012797 nscaf3058 7163761 7166927 - SWGROSS0391

BGIBMGA013063 nscaf3058 7168258 7175427 + SWGROSS0391

BGIBMGA013150 nscaf3062 641988 643304 + SWGROSS0394

BGIBMGA013156 nscaf3063 3472875 3475227 - SWGROSS0398

BGIBMGA005591 nscaf2829 792981 795590 - SWGROSS0402

BGIBMGA005662 nscaf2829 2808783 2811645 + SWGROSS0412

BGIBMGA005548 nscaf2829 2821640 2828107 - SWGROSS0414

Page 30: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA005670 nscaf2829 2954764 2958794 + SWGROSS0415

BGIBMGA000181 nscaf125 283447 290781 + SWGROSS0421

BGIBMGA007030 nscaf2865 1786906 1792175 - SWGROSS0429

BGIBMGA007075 nscaf2865 1934487 1942700 + SWGROSS0435

BGIBMGA007026 nscaf2865 1947302 1949128 - SWGROSS0435

BGIBMGA007025 nscaf2865 1955523 1960187 - SWGROSS0436

BGIBMGA007076 nscaf2865 1966386 1976576 + SWGROSS0436

BGIBMGA007022 nscaf2865 2193403 2196647 - SWGROSS0438

BGIBMGA007021 nscaf2865 2203719 2207035 - SWGROSS0438

BGIBMGA003946 nscaf2766 229368 229922 - SWGROSS0442

BGIBMGA008477 nscaf2902 4348713 4349923 + SWGROSS0449

BGIBMGA008478 nscaf2902 4361725 4362048 + SWGROSS0450

BGIBMGA008480 nscaf2902 4409834 4410312 + SWGROSS0452

BGIBMGA008345 nscaf2902 4421504 4421851 - SWGROSS0452

BGIBMGA008481 nscaf2902 4426870 4427472 + SWGROSS0452

BGIBMGA008482 nscaf2902 4448851 4453538 + SWGROSS0452

BGIBMGA008336 nscaf2902 6319391 6319705 - SWGROSS0457

BGIBMGA008299 nscaf2902 10742813 10744010 - SWGROSS0463

BGIBMGA008229 nscaf2899 320031 320315 + SWGROSS0467

BGIBMGA008599 nscaf2903 494356 495249 + SWGROSS0470

BGIBMGA012578 nscaf3052 1330315 1336544 - SWGROSS0475

BGIBMGA001909 nscaf2204 294768 295203 - SWGROSS0481

BGIBMGA001943 nscaf2204 2412039 2415069 + SWGROSS0487

BGIBMGA001877 nscaf2204 2416638 2423617 - SWGROSS0488

BGIBMGA001946 nscaf2204 2543067 2543568 + SWGROSS0491

BGIBMGA004058 nscaf2767 24541 28208 + SWGROSS0494

BGIBMGA004072 nscaf2767 437273 437623 + SWGROSS0497

BGIBMGA004073 nscaf2767 439723 439992 + SWGROSS0497

BGIBMGA004089 nscaf2767 1084199 1086233 + SWGROSS0500

BGIBMGA004095 nscaf2767 1703802 1707929 + SWGROSS0502

BGIBMGA004011 nscaf2767 2501868 2502656 - SWGROSS0506

BGIBMGA004133 nscaf2767 3511558 3513051 + SWGROSS0507

BGIBMGA000192 nscaf1299 68753 72629 - SWGROSS0509

BGIBMGA000191 nscaf1299 193256 194224 - SWGROSS0514

BGIBMGA000197 nscaf1299 195757 196503 + SWGROSS0514

BGIBMGA000190 nscaf1299 197533 199716 - SWGROSS0514

BGIBMGA004423 nscaf2795 549940 561185 - SWGROSS0522

BGIBMGA004422 nscaf2795 568650 574653 - SWGROSS0523

BGIBMGA004437 nscaf2795 583598 586819 + SWGROSS0524

BGIBMGA004441 nscaf2795 730395 732192 + SWGROSS0525

BGIBMGA004491 nscaf2795 3101891 3102913 + SWGROSS0527

Page 31: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA004294 nscaf2789 712177 714786 - SWGROSS0531

BGIBMGA004332 nscaf2789 927496 930196 + SWGROSS0532

BGIBMGA004281 nscaf2789 942561 948331 - SWGROSS0533

BGIBMGA014154 nscaf481 451441 452100 - SWGROSS0535

BGIBMGA009164 nscaf2937 1396803 1401488 + SWGROSS0537

BGIBMGA012390 nscaf3041 938874 939662 + SWGROSS0539

BGIBMGA012396 nscaf3041 1618943 1626831 + SWGROSS0544

BGIBMGA012441 nscaf3044 249001 255814 + SWGROSS0546

BGIBMGA012442 nscaf3044 272261 272561 + SWGROSS0546

BGIBMGA012443 nscaf3044 288683 291823 + SWGROSS0547

BGIBMGA001474 nscaf2136 334507 335517 - SWGROSS0564

BGIBMGA001424 nscaf2136 4964679 4965017 - SWGROSS0572

BGIBMGA001413 nscaf2136 5615362 5615526 - SWGROSS0573

BGIBMGA007183 nscaf2868 62080 68901 - SWGROSS0579

BGIBMGA007147 nscaf2868 1185017 1185952 - SWGROSS0586

BGIBMGA007214 nscaf2868 1698453 1699621 + SWGROSS0588

BGIBMGA012706 nscaf3055 1319699 1327487 + SWGROSS0603

BGIBMGA012707 nscaf3055 1338655 1341022 + SWGROSS0603

BGIBMGA012672 nscaf3055 1388503 1390148 - SWGROSS0605

BGIBMGA000455 nscaf1681 4443505 4450565 + SWGROSS0614

BGIBMGA000458 nscaf1681 4706693 4706896 + SWGROSS0615

BGIBMGA000459 nscaf1681 4708125 4709333 + SWGROSS0615

BGIBMGA000235 nscaf1681 4724495 4729130 - SWGROSS0616

BGIBMGA009976 nscaf2980 108111 113882 + SWGROSS0618

BGIBMGA009977 nscaf2980 117140 127672 + SWGROSS0618

BGIBMGA010860 nscaf3005 1466030 1466929 - SWGROSS0624

BGIBMGA011520 nscaf3027 604573 605113 + SWGROSS0632

BGIBMGA011522 nscaf3027 634072 637945 + SWGROSS0634

BGIBMGA011491 nscaf3027 639807 641952 - SWGROSS0634

BGIBMGA011575 nscaf3027 3628029 3629330 + SWGROSS0642

BGIBMGA011447 nscaf3027 3629492 3630736 - SWGROSS0642

BGIBMGA011302 nscaf3026 267721 267891 + SWGROSS0652

BGIBMGA011263 nscaf3026 2522455 2529501 - SWGROSS0654

BGIBMGA011150 nscaf3022 980539 983376 + SWGROSS0664

BGIBMGA011111 nscaf3022 1103245 1106694 - SWGROSS0666

BGIBMGA011108 nscaf3022 1153083 1158676 - SWGROSS0668

BGIBMGA011106 nscaf3022 1173109 1182099 - SWGROSS0669

BGIBMGA011105 nscaf3022 1188435 1190338 - SWGROSS0670

BGIBMGA013304 nscaf3066 536514 537681 + SWGROSS0674

BGIBMGA000074 nscaf1108 2370811 2374956 - SWGROSS0675

BGIBMGA000158 nscaf1108 2693518 2696851 + SWGROSS0677

Page 32: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA000068 nscaf1108 2724200 2728350 - SWGROSS0678

BGIBMGA009573 nscaf2962 998093 999370 - SWGROSS0686

BGIBMGA009621 nscaf2962 1000811 1002755 + SWGROSS0686

BGIBMGA003814 nscaf2686 727389 735981 - SWGROSS0688

BGIBMGA003813 nscaf2686 745692 748956 - SWGROSS0688

BGIBMGA012143 nscaf3035 1333943 1337447 - SWGROSS0690

BGIBMGA012209 nscaf3035 1620720 1621383 + SWGROSS0691

BGIBMGA000803 nscaf1705 723956 726243 + SWGROSS0692

BGIBMGA000811 nscaf1705 929222 929401 + SWGROSS0693

BGIBMGA000762 nscaf1705 931265 942452 - SWGROSS0693

BGIBMGA004930 nscaf2822 1120808 1125596 - SWGROSS0698

BGIBMGA005126 nscaf2823 1711000 1715397 + SWGROSS0708

BGIBMGA005127 nscaf2823 1716842 1719277 + SWGROSS0708

BGIBMGA005073 nscaf2823 1719576 1735077 - SWGROSS0708

BGIBMGA005054 nscaf2823 2614741 2614938 - SWGROSS0712

BGIBMGA005037 nscaf2823 3014996 3026850 - SWGROSS0713

BGIBMGA005036 nscaf2823 3032818 3038922 - SWGROSS0714

BGIBMGA005035 nscaf2823 3039980 3042193 - SWGROSS0714

BGIBMGA004889 nscaf2819 70070 74975 - SWGROSS0717

BGIBMGA014016 nscaf3099 4534081 4535149 + SWGROSS0731

BGIBMGA013766 nscaf3097 122693 125957 - SWGROSS0732

BGIBMGA013774 nscaf3097 143615 152483 + SWGROSS0733

BGIBMGA007420 nscaf2883 1643269 1647812 + SWGROSS0742

BGIBMGA007348 nscaf2883 1650835 1652450 - SWGROSS0742

BGIBMGA008883 nscaf2930 3679024 3682236 - SWGROSS0750

BGIBMGA009099 nscaf2931 1109065 1114933 + SWGROSS0762

BGIBMGA009063 nscaf2931 1181113 1182838 - SWGROSS0763

BGIBMGA009100 nscaf2931 1207086 1208209 + SWGROSS0763

BGIBMGA006126 nscaf2847 2235509 2236204 + SWGROSS0770

BGIBMGA006024 nscaf2847 2249481 2250042 - SWGROSS0770

BGIBMGA006196 nscaf2847 7317610 7318551 + SWGROSS0774

BGIBMGA005947 nscaf2847 7324067 7326058 - SWGROSS0774

BGIBMGA003117 nscaf2589 2788848 2792405 + SWGROSS0784

BGIBMGA003120 nscaf2589 2982040 2984233 + SWGROSS0787

BGIBMGA003017 nscaf2589 3970067 3974199 - SWGROSS0790

BGIBMGA003001 nscaf2589 4430467 4434674 - SWGROSS0792

BGIBMGA002986 nscaf2589 5011367 5014764 - SWGROSS0793

BGIBMGA003182 nscaf2589 5020574 5025999 + SWGROSS0793

BGIBMGA003183 nscaf2589 5035050 5035904 + SWGROSS0794

BGIBMGA002984 nscaf2589 5038429 5043860 - SWGROSS0794

BGIBMGA003184 nscaf2589 5045637 5048623 + SWGROSS0794

Page 33: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA003210 nscaf2589 6340171 6342603 + SWGROSS0803

BGIBMGA003775 nscaf2681 23335 26088 - SWGROSS0804

BGIBMGA003774 nscaf2681 40830 43373 - SWGROSS0804

BGIBMGA003796 nscaf2681 1375483 1380402 + SWGROSS0808

BGIBMGA013534 nscaf3075 927568 931683 + SWGROSS0812

BGIBMGA013484 nscaf3075 947924 949209 - SWGROSS0813

BGIBMGA013479 nscaf3075 1074746 1083885 - SWGROSS0815

BGIBMGA002650 nscaf2529 20030 23473 + SWGROSS0817

BGIBMGA002659 nscaf2529 394849 397565 + SWGROSS0818

BGIBMGA002638 nscaf2529 401169 409347 - SWGROSS0818

BGIBMGA002637 nscaf2529 410415 411703 - SWGROSS0818

BGIBMGA002665 nscaf2529 1368318 1369074 + SWGROSS0820

BGIBMGA003662 nscaf2674 3201995 3205732 + SWGROSS0838

BGIBMGA003527 nscaf2674 3336473 3338854 - SWGROSS0839

BGIBMGA003522 nscaf2674 3651420 3652209 - SWGROSS0841

BGIBMGA003520 nscaf2674 3700656 3701981 - SWGROSS0843

BGIBMGA003692 nscaf2674 4914435 4915076 + SWGROSS0845

BGIBMGA003722 nscaf2674 6172780 6173906 + SWGROSS0850

BGIBMGA003745 nscaf2674 7348048 7348551 + SWGROSS0855

BGIBMGA003751 nscaf2674 8028229 8031659 + SWGROSS0856

BGIBMGA013449 nscaf3074 177476 178865 - SWGROSS0859

BGIBMGA013438 nscaf3074 536518 540692 - SWGROSS0863

BGIBMGA006506 nscaf2853 4782060 4790575 + SWGROSS0880

BGIBMGA006279 nscaf2852 1273614 1282516 - SWGROSS0887

BGIBMGA002750 nscaf2556 706207 711285 + SWGROSS0895

BGIBMGA010243 nscaf2986 3210189 3211716 + SWGROSS0901

BGIBMGA005258 nscaf2827 703485 704826 - SWGROSS0916

BGIBMGA005257 nscaf2827 743864 745688 - SWGROSS0917

BGIBMGA005389 nscaf2828 532676 541292 - SWGROSS0920

BGIBMGA005388 nscaf2828 547118 547979 - SWGROSS0920

BGIBMGA005387 nscaf2828 612009 614540 - SWGROSS0923

BGIBMGA005361 nscaf2828 2200028 2200666 - SWGROSS0926

BGIBMGA005291 nscaf2828 5832182 5832448 - SWGROSS0934

BGIBMGA005290 nscaf2828 5839332 5842013 - SWGROSS0935

BGIBMGA009855 nscaf2970 937175 938230 - SWGROSS0937

BGIBMGA008026 nscaf2889 475035 479870 - SWGROSS0948

BGIBMGA008065 nscaf2889 1123114 1125760 + SWGROSS0956

BGIBMGA008005 nscaf2889 1770015 1770661 - SWGROSS0958

BGIBMGA008077 nscaf2889 1947243 1948562 + SWGROSS0960

BGIBMGA008088 nscaf2890 1358624 1359112 - SWGROSS0961

BGIBMGA012547 nscaf3048 195141 202355 - SWGROSS0962

Page 34: 40 Genomas Bombyx Mori Jun Wang

BGIBMGA012546 nscaf3048 207623 210390 - SWGROSS0962

BGIBMGA012536 nscaf3048 934812 939078 - SWGROSS0967

BGIBMGA012568 nscaf3048 942176 945749 + SWGROSS0968

BGIBMGA002437 nscaf2511 1155942 1158533 - SWGROSS0977

BGIBMGA002436 nscaf2511 1160826 1165937 - SWGROSS0978

BGIBMGA002473 nscaf2511 1419729 1425032 + SWGROSS0981

BGIBMGA002503 nscaf2511 2703098 2705442 + SWGROSS0986

BGIBMGA007227 nscaf2874 277365 279865 - SWGROSS0997

BGIBMGA007316 nscaf2879 27970 29131 - SWGROSS1003

BGIBMGA009198 nscaf2940 21340 21832 + SWGROSS1005

BGIBMGA009199 nscaf2940 35696 38513 + SWGROSS1006

BGIBMGA009200 nscaf2940 59303 62354 + SWGROSS1007

BGIBMGA009195 nscaf2940 144091 146890 - SWGROSS1009

BGIBMGA012222 nscaf3038 207968 208852 + SWGROSS1016

BGIBMGA012642 nscaf3053 88478 90119 - SWGROSS1018

BGIBMGA014460 scaffold697 46872 47867 - SWGROSS1036

BGIBMGA014530 scaffold773 15075 20974 - SWGROSS1039

BK006600 nscaf2983 2855561 2858301 + SWGROSS0898

BMOBMSQD2 nscaf3074 225809 230471 + SWGROSS0861

DQ443151 nscaf2855 4124966 4125468 - SWGROSS0135

NM_001043430 nscaf2819 77516 87377 + SWGROSS0718

NM_001043469 nscaf3031 4519785 4521943 + SWGROSS0193

NM_001043506 nscaf2993 7930909 7934254 - SWGROSS0236

NM_001043536 nscaf1681 710443 712032 - SWGROSS0608

NM_001043640 nscaf2986 3400971 3408606 - SWGROSS0902

NM_001043670 nscaf2795 3112010 3114954 - SWGROSS0527

NM_001043790 nscaf3026 6536157 6541507 + SWGROSS0661

NM_001043818 nscaf2998 524831 525679 + SWGROSS0249

NM_001043858 nscaf3074 225809 228153 + SWGROSS0861

NM_001043864 nscaf2823 112743 113792 + SWGROSS0702

NM_001043925 nscaf3058 3535759 3539189 - SWGROSS0386

NM_001043949 nscaf3027 614772 616427 + SWGROSS0633

NM_001044079 nscaf2589 4436428 4440264 + SWGROSS0792

NM_001044193 nscaf3058 3544760 3551068 - SWGROSS0386

NM_001046707 nscaf2674 3651631 3652224 - SWGROSS0841

NM_001046773 nscaf3055 1358891 1361101 - SWGROSS0604

NM_001046846 nscaf3055 1379740 1382298 + SWGROSS0605

NM_001046888 nscaf2855 4124416 4125468 - SWGROSS0135

NM_001046906 nscaf2888 4533439 4534773 - SWGROSS0352

NM_001046908 nscaf2398 18613 21555 - SWGROSS0554

NM_001046914 nscaf3055 1334191 1337290 - SWGROSS0603

Page 35: 40 Genomas Bombyx Mori Jun Wang

NM_001046956 nscaf3074 181646 188437 - SWGROSS0859

NM_001047050 nscaf2589 5028991 5030021 - SWGROSS0793

NM_001047081 nscaf2556 707639 711751 + SWGROSS0895

NM_001048240 nscaf3048 204154 205695 + SWGROSS0962

NM_001098281 nscaf1898 8495777 8497304 - SWGROSS0294

NM_001098283 nscaf2589 5090413 5090727 - SWGROSS0795

NM_001098292 nscaf1898 15756226 15759415 + SWGROSS0308

NM_001098355 nscaf2970 937175 938024 - SWGROSS0937

NM_001098362 nscaf3055 1328064 1331664 - SWGROSS0603

NM_001099614 nscaf2795 737598 739792 - SWGROSS0525

NM_001099617 nscaf2828 5846111 5846618 - SWGROSS0935

NM_001099621 nscaf2993 7962631 7964876 + SWGROSS0238

NM_001099812 nscaf3097 865247 866299 - SWGROSS0735

NM_001099812 nscaf2983 1786366 1787409 - SWGROSS0896

NM_001102461 nscaf2674 3705843 3715524 + SWGROSS0843

NM_001105232 nscaf2828 595874 600459 + SWGROSS0922

NM_001109933 nscaf2993 7891332 7894362 - SWGROSS0234

NM_001110008 nscaf2681 27230 31947 + SWGROSS0804

NM_001113276 nscaf3026 2396448 2401216 - SWGROSS0653

NM_001114935 nscaf2912 405528 407195 + SWGROSS0908

NM_001123339 nscaf2993 8045520 8054363 - SWGROSS0244

NM_001130876 nscaf2902 4364773 4366233 - SWGROSS0450

NM_001130897 nscaf2962 336752 339454 + SWGROSS0684

NM_001130902 nscaf1690 386396 400958 + SWGROSS0020

NM_001134916 nscaf2829 796907 801316 - SWGROSS0402

NM_001142487 nscaf2888 4426682 4428780 + SWGROSS0350

S74376 nscaf2852 1255833 1256608 - SWGROSS0886

Page 36: 40 Genomas Bombyx Mori Jun Wang

Table S10. Number of genes per GROSS. Gene # per GROSS GROSS #

1 212

2 42

3 9

4 4

5 3

Page 37: 40 Genomas Bombyx Mori Jun Wang

Supporting References and Notes S1. http://www.illumina.com/. S2. J. Wang et al., Nature 456, 60 (2008). S3. J. Wang et al., Nucleic Acids Res. 33, D399 (2005). S4. Silkworm Genome Database (http://silkworm.swu.edu.cn/silkdb/ or

http://silkworm.genomics.org.cn/). S5. The International Silkworm Genome Consortium, Insect Biochem. Mol. Biol. 38, 1036 (2008). S6. BmMDB (http://silkworm.swu.edu.cn/microarray/). S7. R. Li, Y. Li, K. Kristiansen, J. Wang, Bioinformatics 24, 713 (2008). S8. R. Li et al., Genome Res. 19, 1124 (2009). S9. http://www.sequenom.com/. S10. J. C. Barrett, B. Fry, J. Maller, M. J. Daly, Bioinformatics 21, 263 (2005). S11. J. Felsenstein, (2005). S12. N. Patterson, A. L. Price, D. Reich, PLoS Genet. 2, e190 (2006). S13. M. Kendall, Biometrika 30, 81-89 (1938). S14. S. Purcell et al., Am J. Hum. Genet. 81, 559 (2007). S15. http://pngu.mgh.harvard.edu/ purcell/plink/. S16. H. Tang, J. Peng, P. Wang, N. J. Risch, Genet. Epidemiol. 28, 289 (2005). S17. http://med.stanford.edu/tanglab/software/frappe.html. S18. R. R. Hudson, Bioinformatics 18, 337 (2002). S19. F. Tajima, Genetics 105, 437 (1983). S20. F. Tajima, Genetics 123, 585 (1989). S21. M. Nei, Molecular evolutionary genetics. (Columbia University Press, New York, 1987). S22. http://rana.stanford.edu/software/. S23. G. A. Watterson, Theor. Popul. Biol. 7, 256 (1975). S24. G. K. Wong et al., Nature 432, 717 (2004). S25. D. J. Begun et al., PLoS Biol. 5, e310 (2007). S26. W. G. Hill, A. Robertson, Theor. Appl. Genet. 31, 881 (1968). S27. S. J. Macdonald, T. Pastinen, A. D. Long, Genetics 171, 1741 (2005). S28. K. Yamamoto et al., Genome Biol. 9, R21 (2008). S29. M. Beye et al., Genome Res. 16, 1339 (2006). S30. P. Gepts, R. Papa Evolution during Domestication. In: ENCYCLOPEDIA OF LIFE

SCIENCES. John Wiley & Sons Ltd, Chichester (2002). S31. Q. Zhu, Mol. Biol. Evol. 24, 875 (2007). S32. A. Haudry, Mol. Biol. Evol. 24, 1506 (2007). S33. M. I. McCarthy et al., Nat. Rev. Genet. 9, 356 (2008). S34. S. K. Wright, R. E. Viola, J. Biol. Chem. 276, 31151 (2001). S35. G. K. Balendiran et al., J. Biol. Chem. 275, 27045 (2000). S36. T. Ito, Nutrition and artificial diets of the silkworm, Bombyx mori. (Nihon-Sanshi-Shinbun

Press, Tokyo, 1983). S37. G. Liti et al., Nature 458, 337 (2009). S38. K. E. Holt et al., Nat. Genet. 40, 987 (2008). S39. J. Mu et al., Nat. Genet. 39, 126 (2007). S40. A. C. Palmenberg et al., Science 324, 55 (2009). S41. K. A. Frazer et al., Nature 449, 851 (2007). S42. K. A. Frazer et al., Nature 448, 1050 (2007). S43. K. Lindblad-Toh et al., Nature 438, 803 (2005). S44. J. W. Kijas et al., PLoS ONE 4, e4668 (2009). S45. R. A. Gibbs et al., Science 324, 528 (2009). S46. M. R. Goldsmith, T. Shimada, H. Abe, Annu. Rev. Entomol. 50, 71 (2005).

Page 38: 40 Genomas Bombyx Mori Jun Wang

S47. K. P. Arunkumar, M. Metta, J. Nagaraju, Mol. Phylogenet. Evol. 40, 419 (2006). S48. C. A. Driscoll, D. W. Macdonalda, S. J. O’Brien, Proc. Natl. Acad. Sci. USA. 106, 9971 (2009).