supplementary information 130921 - media.nature.com · 5 supplementary figs. 1-36 supplementary...

99
0 Supplementary Information for Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars Mingzhou Li 1,2,13 , Shilin Tian 3,13 , Long Jin 1,13 , Guangyu Zhou 3,13 , Ying Li 1,13 , Yuan Zhang 3,13 , Tao Wang 1 , Carol KL Yeung 3 , Lei Chen 4 , Jideng Ma 1 , Jinbo Zhang 3 , Anan Jiang 1 , Ji Li 3 , Chaowei Zhou 1 , Jie Zhang 1 , Yingkai Liu 1 , Xiaoqing Sun 3 , Hongwei Zhao 3 , Zexiong Niu 3 , Pinger Lou 1 , Linjin Xian 1 , Xiaoyong Shen 3 , Shaoqing Liu 3 , Shunhua Zhang 1 , Mingwang Zhang 1 , Li Zhu 1 , Surong Shuai 1 , Lin Bai 1 , Guoqing Tang 1 , Haifeng Liu 1 , Yanzhi Jiang 1 , Miaomiao Mai 1 , Jian Xiao 1 , Xun Wang 1 , Qi Zhou 5 , Zhiquan Wang 6 , Paul Stothard 6 , Ming Xue 7 , Xiaolian Gao 8 , Zonggang Luo 9 , Yiren Gu 10 , Hongmei Zhu 3 , Xiaoxiang Hu 11 , Yaofeng Zhao 11 , Graham S. Plastow 6 , Jinyong Wang 4 , Zhi Jiang 3 , Kui Li 12 , Ning Li 11 , Xuewei Li 1 & Ruiqiang Li 2,3 1 Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Ya’an, China. 2 Biodynamic Optical Imaging Center (BIOPIC), Peking-Tsinghua Center for Life Sciences, and School of Life Sciences, Peking University, Beijing, China. 3 Novogene Bioinformatics Institute, Beijing, China. 4 Chongqing Academy of Animal Science, Chongqing, China. 5 Ya’an Vocational College, Ya’an, China. 6 Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Canada. 7 National Animal Husbandry Service, Ministry of Agriculture of China, Beijing, China. 8 Department of Biology and Biochemistry, University of Houston, Houston, USA. 9 Department of Animal Science, Southwest University at Rongchang, Chongqing, China. 10 Sichuan Animal Science Academy, Chengdu, China. 11 State Key Laboratory for Agrobiotechnology, College of Biological Sciences, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China. 12 Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China. 13 These authors contributed equally to this work. Correspondence should be addressed to X.L. (email: [email protected]) or to R.L. (email: [email protected]). Nature Genetics: doi:10.1038/ng.2811

Upload: lehuong

Post on 08-Feb-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

 

0  

Supplementary Information for

Genomic analyses identify distinct patterns of selection

in domesticated pigs and Tibetan wild boars

Mingzhou Li1,2,13, Shilin Tian3,13, Long Jin1,13, Guangyu Zhou3,13, Ying Li1,13, Yuan

Zhang3,13, Tao Wang1, Carol KL Yeung3, Lei Chen4, Jideng Ma1, Jinbo Zhang3, Anan

Jiang1, Ji Li3, Chaowei Zhou1, Jie Zhang1, Yingkai Liu1, Xiaoqing Sun3, Hongwei Zhao3,

Zexiong Niu3, Pinger Lou1, Linjin Xian1, Xiaoyong Shen3, Shaoqing Liu3, Shunhua

Zhang1, Mingwang Zhang1, Li Zhu1, Surong Shuai1, Lin Bai1, Guoqing Tang1, Haifeng

Liu1, Yanzhi Jiang1, Miaomiao Mai1, Jian Xiao1, Xun Wang1, Qi Zhou5, Zhiquan Wang6,

Paul Stothard6, Ming Xue7, Xiaolian Gao8, Zonggang Luo9, Yiren Gu10, Hongmei Zhu3,

Xiaoxiang Hu11, Yaofeng Zhao11, Graham S. Plastow6, Jinyong Wang4, Zhi Jiang3, Kui

Li12, Ning Li11, Xuewei Li1 & Ruiqiang Li2,3

1 Institute of Animal Genetics and Breeding, College of Animal Science and Technology,

Sichuan Agricultural University, Ya’an, China.

2 Biodynamic Optical Imaging Center (BIOPIC), Peking-Tsinghua Center for Life Sciences,

and School of Life Sciences, Peking University, Beijing, China.

3 Novogene Bioinformatics Institute, Beijing, China.

4 Chongqing Academy of Animal Science, Chongqing, China.

5 Ya’an Vocational College, Ya’an, China.

6 Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton,

Canada.

7 National Animal Husbandry Service, Ministry of Agriculture of China, Beijing, China. 8 Department of Biology and Biochemistry, University of Houston, Houston, USA.

9 Department of Animal Science, Southwest University at Rongchang, Chongqing, China.

10 Sichuan Animal Science Academy, Chengdu, China.

11 State Key Laboratory for Agrobiotechnology, College of Biological Sciences, National

Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, China.

12 Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.

13 These authors contributed equally to this work.

Correspondence should be addressed to X.L. (email: [email protected]) or to R.L.

(email: [email protected]).

Nature Genetics: doi:10.1038/ng.2811

 

1  

Table of contents

Supplementary Figs. 1-36 ...................................................................................................... 5

Supplementary Fig. 1. The distribution areas of the original Tibetan wild boar in China. 5 Supplementary Fig. 2. Comparison of Tibetan wild boar and domestic Duroc pig. ......... 6 Supplementary Fig. 3. Synteny between the Tibetan wild boar and Duroc pig genomes. .......................................................................................................................................... 7 Supplementary Fig. 4. Distribution of 19-mer frequency. ................................................ 8 Supplementary Fig. 5. The GC content and CpG frequency for 10 kb, non-overlapping sliding windows across the Tibetan wild boar genome and five other mammalian genomes. .......................................................................................................................... 8 Supplementary Fig. 6. GC content against the sequencing depth of Tibetan wild boar genome. ............................................................................................................................ 9 Supplementary Fig. 7. Depth distribution of fraction bases. ............................................ 9 Supplementary Fig. 8. Distribution of heterozygosity density in the Tibetan wild boar diploid genome. ............................................................................................................... 10 Supplementary Fig. 9. Comparison of gene parameters among the Tibetan wild boar and five other mammalian genomes. .............................................................................. 10 Supplementary Fig. 10. Divergence distribution of classified families of transposable elements. ........................................................................................................................ 11 Supplementary Fig. 11. Length distribution of InDels in the Tibetan wild boar whole genome and in coding sequence (CDS) regions............................................................. 12 Supplementary Fig. 12. Orthology assignment of the Tibetan wild boar, Duroc pig and human genomes. ............................................................................................................ 13 Supplementary Fig. 13. Sequence depth distribution between single- and multi-copy genes in the Tibetan wild boar genome. ......................................................................... 14 Supplementary Fig. 14. Orthology delineation among the protein-coding gene family repertoires of the Tibetan wild boar and five other mammals. ......................................... 14 Supplementary Fig. 15. Venn diagrams showing the distribution of shared and unique gene families. .................................................................................................................. 15 Supplementary Fig. 16. Distribution of pairwise amino acid identity of orthologs between the Tibetan wild boar and five other mammals. ............................................................... 15 Supplementary Fig. 17. Venn diagram showing the distribution of olfactory-related gene repertoires among six mammals. .................................................................................... 16 Supplementary Fig. 18. Identification and comparison of olfactory receptor genes among six mammals using conserved olfactory receptor-specific motifs. ................................... 17 Supplementary Fig. 19. Phylogenetic analysis of the olfactory-related gene repertoires. ........................................................................................................................................ 18 Supplementary Fig. 20. Amino acid identity of olfactory-related genes between Duroc pig, Tibetan wild boar and four other mammals. ............................................................. 18 Supplementary Fig. 21. Average protein similarity of olfactory-related genes and total genes between Duroc pig, Tibetan wild boar and four other mammals. .......................... 19 Supplementary Fig. 22. Comparison of ω values between PSGs in Tibetan wild boar and Duroc pig. ....................................................................................................................... 20 Supplementary Fig. 23. Tibetan wild boar and Duroc pig KA/KS (ω) in functional gene categories. ...................................................................................................................... 21 Supplementary Fig. 24. PSGs in Tibetan wild boar involved in the pathway ‘mTOR

Nature Genetics: doi:10.1038/ng.2811

 

2  

signaling’ and ‘vascular smooth muscle contraction’. ...................................................... 22 Supplementary Fig. 25. Comparison of the proportions of PSGs in Tibetan wild boar and Duroc pig. ....................................................................................................................... 23 Supplementary Fig. 26. PSGs in Duroc pig involved in the pathway of ‘extracellular matrix (ECM)-receptor interaction’. ................................................................................. 23 Supplementary Fig. 27. Inactivation events of six identified pseudogenes related to ‘response to drug’ in the Tibetan wild boar genome. ....................................................... 24 Supplementary Fig. 28. Genetic structure analysis for 103 sequenced individuals using FRAPPE with K = 2 to 9. ................................................................................................. 25 Supplementary Fig. 29. Genome-wide distribution of SNPs. ........................................ 26 Supplementary Fig. 30. Box plot of θπ ratio (θπ, domestic / θπ, Tibetan) and FST values for regions of Tibetan wild boars and Chinese domestic pigs that have undergone positive selection versus the whole genome. ............................................................................... 26 Supplementary Fig. 31. Distribution of selection statistics (Tajima’s D). ....................... 27 Supplementary Fig. 32. LD patterns between the selected regions and whole genome of Tibetan wild boars and Chinese domestic pigs. .............................................................. 28 Supplementary Fig. 33. Analysis of the phylogenetic relationship of Tibetan wild boars (n = 30) and neighboring domestic pigs (n = 15) using SNPs in regions with strong selective sweep signals. ................................................................................................................ 29 Supplementary Fig. 34. Genes embedded in naturally selected regions in Tibetan wild boars related to ‘vitamin B6 binding’ and ‘response to hypoxia’. ..................................... 30 Supplementary Fig. 35. Genes examined in the ‘saliva secretion’ functional category (GO-BP: 0046541) showed signatures of selective sweeps in Chinese domestic pigs. .. 31 Supplementary Fig. 36. Vacuum chewing (Domestic Duroc pig). ................................. 32

Supplementary Tables 1-8, 10-16, 18-22, 24-27 and 29-36 ................................................ 33

Supplementary Table 1. Genome sequencing strategy for the Tibetan wild boar.......... 33 Supplementary Table 2. Estimation of the Tibetan wild boar genome size using K-mer analysis. .......................................................................................................................... 34 Supplementary Table 3. Summary of the Tibetan wild boar genome assembly. ........... 34 Supplementary Table 4. Summary of mapping and coverage depth............................. 35 Supplementary Table 5. Transposon element families in the Tibetan wild boar genome based on various methods. ............................................................................................. 35 Supplementary Table 6. Transposon element families in the Tibetan wild boar genome based on homolog alignment. ......................................................................................... 36 Supplementary Table 7. Summary of InDels in the Tibetan wild boar genome. ............ 37 Supplementary Table 8. Summary of syntenic regions between the Tibetan wild boar and Duroc pig genomes. ................................................................................................. 37 Supplementary Table 10. Summary of non-coding RNA distribution and annotation in the Tibetan wild boar genome. .............................................................................................. 38 Supplementary Table 11. Characteristics of the Tibetan wild boar and Duroc pig genome assemblies. ..................................................................................................................... 39 Supplementary Table 12. Summary of RNA-seq mapping results ................................ 40 Supplementary Table 13. Summary of evidence for the EVidenceModeler (EVM) gene models in the Tibetan wild boar genome. ........................................................................ 41 Supplementary Table 14. Assessment of sequence coverage of the Tibetan wild boar

Nature Genetics: doi:10.1038/ng.2811

 

3  

genome assembly using the CDS regions of the Duroc pig genome. ............................. 41 Supplementary Table 15. Summary of predicted protein-coding genes in the Tibetan wild boar genome compared with other representative mammalian genomes. ...................... 42 Supplementary Table 16. Number of Tibetan wild boar genes with functional classification by various methods. ................................................................................... 42 Supplementary Table 18. Functional gene categories enriched for the Tibetan wild boar- and Duroc pig-specific families. ...................................................................................... 43 Supplementary Table 19. Summary of gene families in six mammals. ......................... 44 Supplementary Table 20. Functional gene categories enriched for the Tibetan wild boar- and Duroc pig-specific expansion families. ..................................................................... 45 Supplementary Table 21. Positively selected genes (PSGs) identified in the Tibetan wild boar and Duroc pig genomes. ......................................................................................... 46 Supplementary Table 22. Functional gene categories enriched for the 215 PSGs in the Tibetan wild boar and 182 PSGs in the Duroc pig. .......................................................... 57 Supplementary Table 24. List of a priori functional candidate genes related to ‘response to hypoxia’, ‘response to UV’ and ‘energy metabolism’. .................................................. 59 Supplementary Table 25. Functional candidate genes related to ‘response to hypoxia’ under positive selection in the Tibetan wild boar (21 PSGs) and Duroc pig (1 PSG). ..... 61 Supplementary Table 26. Functional candidate genes related to ‘response to UV’ under positive selection in the Tibetan wild boar (6 PSGs). ...................................................... 63 Supplementary Table 27. Functional candidate genes related to ‘energy metabolism’ under positive selection in the Tibetan wild boar (17 PSGs) and Duroc pig (21 PSGs). . 64 Supplementary Table 29. Functional gene categories enriched for Tibetan wild boar pseudogenes. ................................................................................................................. 69 Supplementary Table 30. Drug response genes that that appear inactive in the Tibetan wild boar genome. ........................................................................................................... 70 Supplementary Table 31. Summary and mapping statistics of sampled pig populations/breeds. ......................................................................................................... 71 Supplementary Table 32. Summary and mapping statistics of the downloaded pig genome re-sequencing data. .......................................................................................... 73 Supplementary Table 33. Summary of SNP calling on a population-scale. .................. 76 Supplementary Table 34. Tracy-Widom (TW) statistics for the first ten eigenvalues from PCA analysis of pig breeds. ............................................................................................ 76 Supplementary Table 35. Summary of SNPs in Tibetan wild boars and Chinese domestic pigs. ................................................................................................................. 77 Supplementary Table 36. Functional gene categories enriched for genes affected by natural and artificial selection. ......................................................................................... 78

Supplementary Note ............................................................................................................ 80

1 De novo sequencing, assembly and annotation of Tibetan wild boar genome .... 80 1.1 Sequencing strategy and data generation ......................................................... 80 1.2 Sequence quality checking and filtering ............................................................. 80 1.3 Estimation of genome size using K-mer method ................................................ 80 1.4 De novo assembly ............................................................................................. 81 1.5 Detections of heterozygous SNPs and deletion or insertion polymorphisms (InDels) .................................................................................................................... 82

Nature Genetics: doi:10.1038/ng.2811

 

4  

1.6 Repeat annotation.............................................................................................. 82 1.7 Structural annotation of genes ........................................................................... 83 1.8 Functional annotation of genes .......................................................................... 84 1.9 non-coding RNA (ncRNA) annotations ............................................................... 84

2 Lineage-specific genes ............................................................................................. 84 2.1 Gene family cluster and orthology relationships ................................................ 84 2.2 Evidence of transcription for the Tibetan wild boar-specific genes ..................... 85

3 Functional enrichment analyses for genes ............................................................. 85 4 Identification of pseudogenes .................................................................................. 86 5 Population-based re-sequencing and SNP calling.................................................. 86

5.1 Re-sequencing strategy and read mapping ....................................................... 86 5.2 SNP calling ........................................................................................................ 87

6 Demographic history reconstruction ....................................................................... 88 7 Linkage-disequilibrium (LD) analysis ...................................................................... 89

Supplementary URLs ........................................................................................................... 89

Supplementary References ................................................................................................. 90

Nature Genetics: doi:10.1038/ng.2811

 

5  

Supplementary Figs. 1-36

Supplementary Fig. 1. The distribution areas of the original Tibetan wild boar in China.

Tibetan wild boars are primarily distributed in the mountainous grassland, low bulrush

meadows and the valley zone of a large high altitude area in Southwest China (yellow regions),

these mainly include: (a) The Southeast of Tibet autonomous region: Milin (3,700 m altitude),

Nyingchi (3,000 m), Gongbujiangda (3,600 m), Langxian (3,200 m), Bomi (2,700 m),

Mangkang (3,870 m), Zuogong (3,750 m), Bianba (3,500 m), Chaya (3,500 m), Jiangda (3,650

m), Gongjue (3,640 m), and Jiali (4400 m); (b) The Northwest of Sichuan province: Heishui

(3,544 m), Barkam (2,633 m), Xiaojin (2,367 m), Litang (4,014 m), Xiangcheng (2,856 m),

Daocheng (3,750 m), Xinlong (3,500 m), and Dege (3,500 m); (c) The Northwest of Yunnan

province: Shangri-La (3,280 m), Diqing (4,270 m), and Weixi (2,340 m); and (d) The

Southwest of Gansu province: Hezuo (3,000 m), Luqu (3,500 m), and Zhuoni (2,500 m). Data

from the survey report of ‘Area coverage planning of Chinese specific agricultural product,

2006–2015’, Chinese Ministry of Agriculture, 2007.

Nature Genetics: doi:10.1038/ng.2811

 

6  

Tibetan wild boar Duroc pig

Appearance

Breed history

○Indigenous to the Tibetan plateau of China with an average altitude of 4,268 m above sea level, living in the forest and valley zone. ○ Tibetan wild boar has not undergone artificial selection.

○The breed originated in America, one of several red pig strains which developed around 1,800 in New England. ○Duroc has been intensively artificially selected for fast growth, and efficient accumulation of lean meat (muscle).

Characteristics

○Black color. ○Small body size. Under plateau conditions, the average adult body weight is about 50 kg (female is 46 kg, male is 56 kg), and the body length is 71.37 ± 0.73 cm and body height is 45.75 ± 0.52 cm for 13 months (n = 17). ○Slow growth. During the period of 2 to 6 months of age, average daily gain is less than 100 g (99.87 ± 12.11 g, n = 27). ○High deposition of fat. The lean percent is 43.58 ± 5.39 % at 6 months of age, and 39.72 ± 2.75 % at 12 months. The intramuscular fat content is 3.82 ± 0.21 % for 6 months, and 10.15 ± 0.15% for 12 months (n = 17).○Poor meat production. The loin eye area is 12.30 ± 2.18 cm2 for 6 months and 15.15 ± 3.43 cm2 for 12 months (n = 19); the dressing percent is 51.00 ± 1.26 % for 6 months and 74.19 ± 0.52 % for 12 months (n = 17)○Adapted to the high altitude-induced extremely harsh conditions, such as: hypoxia, low temperature, high solar radiation, and lack of food resources. ○Well-developed blood circulation system, strong limbs, long and rigid bristles, presence of down under the hair.○ Large lungs and hearts. Ratio of lung weight versus body weight = 1.36 ± 0.18% (n = 17); ratio of heart weight versus body weight = 0.48 ± 0.08% (n = 17). ○High energy metabolism. The average feed: gain ratio is 4.89 ± 0.04 (n = 17).

○Red color ○Large body size, the average adult body weight is more than 300 kg (female is 350 kg, male is 380 kg). ○Fast growth performance. During the period of 30 to 100 kg, average daily gain is about 900 g (936 ± 33.4 g, n = 120). ○High carcass production. At 6 months, the lean percent is about 63.50 ± 4.29 %; the intramuscular fat content is 3.04 ± 0.33 %; the loin eye area is 44.87 ± 1.92 cm2; the dressing percent is 74.23 ± 0.88% (n =121). ○Bad maternal instincts. ○Late maturing type. ○ Ratio of lung weight versus body weight = 0.83 ± 0.07% (n = 110); ratio of heart weight versus body weight = 0.35 ± 0.04% (n = 110). ○The average feed: gain ratio is 2.38 ± 0.02 (n = 131).

Reproductions

○The average litter size is 4 to 8. The total number of born is 4.00 ± 0.20 for the first parity and 7.25 ± 0.98 for the 2nd to 3rd parity (n = 25). ○The new born piglet is relatively big. The average new born weight is 1.28 ± 0.12 kg (n = 15)

○The average litter size is 8 to 10. The total number of born is 8.42 ± 0.87 for the first parity and 10.74 ± 1.10 for the 2nd to 3rd parity (n = 171).○The average new born piglet weight is 1.7 ± 0.23 kg (n = 142)

Current distribution

Currently, the Tibetan wild boar is mainly distributed in an important natural conservation zone of Southwest China, and the breed is facing the danger of extinction.

Internationally used breed (93 countries)

Supplementary Fig. 2. Comparison of Tibetan wild boar and domestic Duroc pig. Values

are means ± s.d

Nature Genetics: doi:10.1038/ng.2811

 

7  

Supplementary Fig. 3. Synteny between the Tibetan wild boar and Duroc pig genomes.

GC content, density of repeats and density of genes were calculated using a 1 Mb sliding

window. The mitochondrial genome and Y chromosome were excluded. The number of

contiguous syntenic blocks was determined by pairwise comparisons between the Tibetan

and Duroc pig genomes. A total of 2,458 regions of inverted orientation covering more than

186.61 Mb were identified using Breakdancer (parameter –q=20) (Supplementary URLs),

which is slightly higher than the 1,576 inversions covering more than 154 Mb identified

between the human and chimpanzee genomes1. A complete list of inversions is provided in

Supplementary Table 9.

Nature Genetics: doi:10.1038/ng.2811

 

8  

Supplementary Fig. 4. Distribution of 19-mer frequency. In total 130.05 Gb of high-quality

short-insert reads (180 bp) were used to generate the 19-mer depth distribution curve

frequency information.

Supplementary Fig. 5. The GC content (a) and CpG frequency (b) for 10 kb,

non-overlapping sliding windows across the Tibetan wild boar genome and five other

mammalian genomes.

Nature Genetics: doi:10.1038/ng.2811

 

9  

Supplementary Fig. 6. GC content against the sequencing depth of Tibetan wild boar

genome. We used 100 kb non-overlapping sliding windows along the assembled sequence to

calculate GC content and average sequencing depth using short reads.

Supplementary Fig. 7. Depth distribution of fraction bases. The x-axis represents the

sequencing depth, and the y-axis the fraction of bases. The high-quality short-insert reads

(180 bp and 500 bp) were mapped to the Tibetan wild boar genome assembly with an average

depth of 70.8, and ~94.8% of the genome was covered by more than 20 reads.

Nature Genetics: doi:10.1038/ng.2811

 

10  

Supplementary Fig. 8. Distribution of heterozygosity density in the Tibetan wild boar

diploid genome. A total of 4.4 M heterozygous SNPs were identified between the two sets of

chromosomes of the Tibetan wild boar diploid genome. Non-overlapping 50 kb windows were

chosen and the heterozygosity density in each window was calculated.

Supplementary Fig. 9. Comparison of gene parameters among the Tibetan wild boar

and five other mammalian genomes. a, mRNA length; b, CDS length; c, exon length; d,

exon number; and e, intron length. The similar gene parameters between the Tibetan wild

boar and other mammals indicate the high quality gene structure annotation in Tibetan wild

boar genome.

Nature Genetics: doi:10.1038/ng.2811

 

11  

Supplementary Fig. 10. Divergence distribution of classified families of transposable

elements. The classified transposon families in a, Tibetan wild boar, b, Duroc pig, c, human

and d, cattle genomes were aligned onto the consensus in Repbase. The divergence rate was

calculated based on the alignment between the RepeatMasker annotated repeat copies and

the consensus sequence in the repeat library. Notably, although transposable elements

comprise ~39.47% of the Tibetan wild boar genome, which is similar to that of the Duroc pig

genome (40.55%), the length of long interspersed elements (LINEs) with a lower divergence

rate (≤ 10%) was shorter in Tibetan wild boar repeat families (~12.96 Mb) than that in Duroc

pigs (~34.89 Mb). This implies that the Duroc pig genome has experienced considerable

recent transposable element activity, which is a highly effective mechanism for generating

genetic and epigenetic variation that may be acted on by selection.

Nature Genetics: doi:10.1038/ng.2811

 

12  

 

Supplementary Fig. 11. Length distribution of InDels in the Tibetan wild boar whole

genome and in coding sequence (CDS) regions. Consistent with previous reports short

InDels tend to be detected with greater frequency than long InDels, although CDS regions

display an enrichment of InDels that are expected to preserve reading frame2,3.

Nature Genetics: doi:10.1038/ng.2811

 

13  

Supplementary Fig. 12. Orthology assignment of the Tibetan wild boar, Duroc pig and

human genomes. Bars are subdivided to represent different types of orthology relationships.

‘1:1:1’ indicates single-copy orthologs in each genome. ‘N:N:N’, ‘N in 1’, and ‘N in 2’ indicate

multi-copy orthologs in all three, one or two genomes, respectively. ‘X:X:0’, ‘X:0:X’, and ‘0:X:X’

indicate single- or multi-copy groups with genes in only two genomes, respectively. The

lineage-specific genes exhibit no orthology with genes in the other two genomes. For genes

with alternative splicing variants, we chose the longest transcripts (≥ 30 amino acids) to

represent the genes. Mitochondrial genes and unclustered genes are excluded. Most of the

21,806 predicted protein-coding genes in the Tibetan wild boar genome have a homologue

either in the Duroc pig (14,427; 66.16%) or human (12,133, 55.64%), with a core set of 10,190

(46.73%) being shared by these three mammals. There are 7,917 single-copy genes that

have reciprocal best-match orthologs (1:1:1) among these three mammalian genomes. Out of

3,074 Tibetan wild boar-specific genes (1,178 families), 1,752 Duroc pig-specific genes (1,343

families) and 3,832 human-specific genes (2,333 families), 1,979 (64.38%), 1,365 (77.91%)

and 2,610 (68.11%) have known InterPro domains annotation, respectively.

Nature Genetics: doi:10.1038/ng.2811

 

14  

 Supplementary Fig. 13. Sequence depth distribution between single- and multi-copy

genes in the Tibetan wild boar genome. Orthologous genes shared with the Duroc pig and

human (a) and six mammalian genomes (b). Boxes denote the interquartile range (IQR)

between the first and third quartiles (25th and 75th percentiles, respectively) and the line inside

denotes the median. Whiskers denote the lowest and highest values within 1.5 times IQR from

the first and third quartiles, respectively. Outliers beyond the whiskers are shown as black dots.

The sequence depth of multiple-copy genes was in the same range as for single-copy ortholog

genes, indicating that the calculation of gene copy numbers was accurate. 

Supplementary Fig. 14. Orthology delineation among the protein-coding gene family

repertoires of the Tibetan wild boar and five other mammals. The red dashed horizontal

line represents 1,141 single-copy orthologous genes shared within six mammalian genomes.

For genes with alternative splicing variants, we chose the longest transcripts (≥ 30 amino

acids) to represent the genes.

Nature Genetics: doi:10.1038/ng.2811

 

15  

Supplementary Fig. 15. Venn diagrams showing the distribution of shared and unique

gene families. a, Among Tibetan wild boar, cattle, dog, human and mouse. b, Among Duroc

pig, cattle, dog, human and mouse. c, Between Tibetan wild boar and Duroc pig. The Venn

diagram was created with web tools provided by the Bioinformatics and Systems Biology of

Gent (Supplementary URLs). For genes with multiple alternative transcripts, the transcript

with the best alignment was selected. InParanoid (Supplementary URLs) was used to

identify orthologous gene pairs, and then MultiParanoid (Supplementary URLs) was used to

merge them into multiple species orthologous groups. Obviously, the mouse has the most

lineage-specific families compared with the five other mammals.

Supplementary Fig. 16. Distribution of pairwise amino acid identity of orthologs

between the Tibetan wild boar and five other mammals. The Tibetan wild boar exhibited

the highest protein identity with Duroc pigs (mean protein similarity: 94.19%; diverged 6.9

Mya), compared with cattle (88.85%, 63.6 Mya), dog (87.05%, 90.8 Mya), human (86.83%,

99.3 Mya) and mouse (82.94%, 99.3 Mya).

Nature Genetics: doi:10.1038/ng.2811

 

16  

Supplementary Fig. 17. Venn diagram showing the distribution of olfactory-related gene

repertoires among six mammals. Sequences with more than 60% amino acid sequence

identity were clustered together.

Nature Genetics: doi:10.1038/ng.2811

 

17  

Supplementary Fig. 18. Identification and comparison of olfactory receptor genes

among six mammals using conserved olfactory receptor-specific motifs. a, Schema

chart of the three olfactory receptor specific motifs in mammals. The numbers indicate the

positions of amino acids. TM: transmembrane domain. b, Distribution of the olfactory-related

genes by their olfactory receptor motif containing patterns. The motifs within parentheses were

absent. A TBLASTN search was performed to identify genes containing the following

conserved motifs: MAYDRYAIC (TMIII), KAFSTCASH (TMVI), and PMLNPFIY (TMVII)4,5, and

their variants with less than 50% sequence difference from the conserved motif and within a

predicted protein of at least 300 amino acids in length. The Duroc pig has the highest

proportion (79.09%) of sequences containing all three mammalian-specific conserved

olfactory receptors domains, which should be termed as bona fide functional olfactory

receptors. c, Variable amino acids between three conserved motifs. All the amino acid

sequences of the olfactory-related genes that had all three conserved motifs were aligned to

determine the level of variability at each motif. The Duroc pig has the highest level of

divergence (1.35 variable amino acids per motif).

Nature Genetics: doi:10.1038/ng.2811

 

18  

Supplementary Fig. 19. Phylogenetic analysis of the olfactory-related gene repertoires.

a, Six mammalian genomes; b, Duroc pig and Tibetan wild boar genomes. The

neighbor-joining phylogenetic tree was generated using MEGA 5.15 (Supplementary URLs).

The Bootstrap values are from 1,000 trials.

Supplementary Fig. 20. Amino acid identity of olfactory-related genes between Duroc

pig, Tibetan wild boar and four other mammals.

Nature Genetics: doi:10.1038/ng.2811

 

19  

Supplementary Fig. 21. Average protein similarity of olfactory-related genes and total

genes between Duroc pig, Tibetan wild boar and four other mammals.

Nature Genetics: doi:10.1038/ng.2811

 

20  

 

Supplementary Fig. 22. Comparison of ω values between PSGs in Tibetan wild boar (a)

and Duroc pig (b). Orthologous genes with KS > 3 or ω > 5 were filterd6,7 resulting in 5,398

orthologs shared between Tibetan wild boar and Duroc pig. Top panels: Boxes denote the

interquartile range (IQR) between the first and third quartiles (25th and 75th percentiles,

respectively) and the line inside denotes the median. Whiskers denote the lowest and highest

values within 1.5 times IQR from the first and third quartiles, respectively. Outliers beyond the

whiskers are shown as black dots. The PSGs (P < 0.05, likelihood ratio test) in Tibetan wild

boar (or Duroc pig) have significantly higher ω values than that in Duroc pig (or Tibetan wild

boar) and genome background (Mann-Whitney U test, P < 10-16). Lower panels: Bootstrapping

was performed by randomly resampling 105 genes from the 5,398 orthologs and PSGs.

Distribution of genes in the different ω bins confirms the elevated ω values of PSGs.

Nature Genetics: doi:10.1038/ng.2811

 

21  

Supplementary Fig. 23. Tibetan wild boar and Duroc pig KA/KS (ω) in functional gene

categories. Points represent pairs of mean ω in Tibetan wild boar and Duroc pig of genes

significantly enriched (P < 0.05) in various KEGG-pathway, Gene Ontology (GO) biological

process (BP) and molecular function (MF) categories. Dashed lines represent the fold change

in mean ω between Tibetan wild boar versus Duroc pig that are > 2 (lower line) or < 0.5 (upper

line). A complete list of categories is provided in Supplementary Table 23.

Nature Genetics: doi:10.1038/ng.2811

 

22  

 

Supplementary Fig. 24. PSGs in Tibetan wild boar involved in the pathway ‘mTOR

signaling’ (a) and ‘vascular smooth muscle contraction’ (b). Solid lines represent direct

relationships between PSGs (grey boxes) and metabolites (circular nodes), dashed lines

represent indirect relationships, and arrowheads denote directionality (adapted from KEGG

pathway: map04150 and map04270). The ω values of PSGs are also shown.

Nature Genetics: doi:10.1038/ng.2811

 

23  

Supplementary Fig. 25. Comparison of the proportions of PSGs in Tibetan wild boar

and Duroc pig. The numbers of PSG are given in parentheses. Dashed horizontal lines

represent the proportion of a priori functional candidate genes in the genome (i.e. 7,917

single-copy orthologs shared with Tibetan wild boar, Duroc pig and human). UV, ultraviolet.

Supplementary Fig. 26. PSGs in Duroc pig involved in the pathway of ‘extracellular

matrix (ECM)-receptor interaction’. Lines represent direct relationships between PSGs (light

yellow boxes), the downstream signaling effectors of PSGs (blue boxes) and metabolites

(circular nodes) (adapted from KEGG pathway: map 04512). The ω values of 11 PSGs in

Duroc pig (red bar) and their orthologs in Tibetan wild boar (green bar) and human (white bar)

are also shown.

Nature Genetics: doi:10.1038/ng.2811

 

24  

 

Supplementary Fig. 27. Inactivation events of six identified pseudogenes related to

‘response to drug’ in the Tibetan wild boar genome. Boxes and lines indicate exons and

introns, respectively. Red arrows show inactivation events and are labeled with the nature of

the change.

Nature Genetics: doi:10.1038/ng.2811

 

25  

Supplementary Fig. 28. Genetic structure analysis for 103 sequenced individuals using

FRAPPE with K = 2 to 9. In total 55 individuals were added from the EMBL-EBI database7-9

(shown in blue). The different symbols correspond to the different geographic locations in Fig.

2a. Each individual is represented by a stacked column, which is partitioned into 2 to 9 colored

segments with the length of each segment representing the proportion of the individual’s

genome from K = 2 to 9 ancestral populations. The samples are sorted by region/ population

only after the analysis. The population names and geographic locations are at the top of the

figure. The first level of clustering (K = 2) reflects the primary geographical isolation between

Asia-Africa (most samples are in China) and Europe. At K = 3, four other species of genus Sus

from islands of Southeast Asia and an African warthog species become separated from the

Asian-African individuals. At K = 4 the Tibetan wild boars and Asian wild boars were

separated.

Nature Genetics: doi:10.1038/ng.2811

 

26  

Supplementary Fig. 29. Genome-wide distribution of SNPs. Out of 252,121 windows of

100 kb in length sliding in 10 kb steps across the Tibetan wild boar genome, 73,197 windows

contain < 100 SNPs (red bars) and cover 29.03% of the genome (dashed lines). 178,924

windows contain ≥ 100 SNP (blue bars) and cover 70.97% of the genome, and these were

used to detect signatures of selective sweeps. The cumulative % in whole genome length

(black line) is also charted.

Supplementary Fig. 30. Box plot of θπ ratio (θπ, domestic / θπ, Tibetan) (a) and FST values (b)

for regions of Tibetan wild boars and Chinese domestic pigs that have undergone

positive selection versus the whole genome. Boxes denote the interquartile range (IQR)

between the first and third quartiles (25th and 75th percentiles, respectively) and the line inside

denotes the median. Whiskers denote the lowest and highest values within 1.5 times IQR from

the first and third quartiles, respectively. Outliers beyond the whiskers are shown as black dots.

The statistical significance was calculated by the Mann-Whitney U test.

Nature Genetics: doi:10.1038/ng.2811

 

27  

Supplementary Fig. 31. Distribution of selection statistics (Tajima’s D). a, |Tajima’s

Ddomestic – Tajima’s DTibetan| against θπ ratio (θπ,domestic / θπ, Tibetan). b, |Tajima’s Ddomestic – Tajima’s

DTibetan| against FST value. Out of 178,924 windows of length 100 kb across the Tibetan wild

boar genome, 2,802 and 1,076 windows were picked out as regions with strong selective

sweep signals for Tibetan wild boars (green points) and Chinese domestic pigs (blue points). c,

Boxplot of |Tajima’s Ddomestic – Tajima’s DTibetan| in genomic regions with strong selective sweep

signals for Tibetan wild boars and Chinese domestic pigs versus the whole genome. Boxes

denote the interquartile range (IQR) between the first and third quartiles (25th and 75th

percentiles, respectively) and the line inside denotes the median. Whiskers denote the lowest

and highest values within 1.5 times IQR from the first and third quartiles, respectively. Outliers

beyond the whiskers are shown as black dots. The statistical significance was calculated by

the Mann-Whitney U test.

Nature Genetics: doi:10.1038/ng.2811

 

28  

 

Supplementary Fig. 32. LD patterns between the selected regions and whole genome of

Tibetan wild boars and Chinese domestic pigs. Selected regions had significantly higher

LD than the whole genome background across the range of distances separating loci for

Tibetan wild boars and Chinese domestic pigs (P < 10-16, Mann-Whitney U test). LD decays

much more slowly in selected regions than in the whole genome. The LD decay rate was

measured as the distance at which the average squared correlations of allele frequencies (r2)

dropped to half its maximum value. For Tibetan wild boars, the LD decay rates of selected

regions (black line) and whole genomes (gray line) were estimated at ~11.4 kb and ~5.9 kb,

respectively, where the r2 drops to 0.18. For Chinese domestic pigs, LD decay rates of

selected regions (red line) and whole genomes (purple line) were estimated at ~17.8 kb and

~8.1 kb, respectively, where the r2 drops to 0.20.

Nature Genetics: doi:10.1038/ng.2811

 

29  

 

Supplementary Fig. 33. Analysis of the phylogenetic relationship of Tibetan wild boars

(n = 30) and neighboring domestic pigs (n = 15) using SNPs in regions with strong

selective sweep signals. a, A neighbor-joining phylogenetic tree. The scale bar represents p

distance. b, Two-way PCA plot. The fraction of the variance explained is 18.21% for

eigenvector 1 (P = 7.08 × 10-4, Tracy-Widom test) and 8.57% for eigenvector 2 (P = 1.95 ×

10-5, Tracy-Widom test). Out of 9.49 M SNPs in whole genome, only 8.59% (0.81 M) SNPs in

the selected regions of Tibetan wild boars and Chinses domestic pigs were used.

Nature Genetics: doi:10.1038/ng.2811

 

30  

Supplementary Fig. 34. Genes embedded in naturally selected regions in Tibetan wild

boars related to ‘vitamin B6 binding’ and ‘response to hypoxia’. Ratio of sequence

diversity level (θπ ratio, black line), diversity between two populations (FST values, red line),

and selection statistics (Tajima’s D, blue and green lines for Chinese domestic pigs and

Tibetan wild boars, respectively) are plotted using a 10 kb sliding window. Genomic regions

located above the horizontal dashed line (corresponding to a 5% significance level of θπ ratio,

where θπ ratio = 1.10; and a 5% significance level of FST, where FST = 0.361) were termed as

regions with strong selective sweep signals for Tibetan wild boars (gray regions). Genome

annotations are shown at the bottom (black bar: coding sequence, blue bar: gene). Three

genes (ALB, GLDC and SPTLC2) related to ‘‘vitamin B6 binding’, and four genes (ALB, ECE1,

GNG2 and PIK3C2G) related to ‘response to hypoxia’ are marked in red.

Nature Genetics: doi:10.1038/ng.2811

 

31  

Supplementary Fig. 35. Genes examined in the ‘saliva secretion’ functional category

(GO-BP: 0046541) showed signatures of selective sweeps in Chinese domestic pigs.

Nine genes exhibited a lower θπ ratio, higher FST and |Tajima’s Ddomestic – Tajima’s DTibetan|

compared with the genome background. a, Two genes (KCNMA1 and TRPC1) embedded in

regions with significant signatures of selective sweeps are marked in red. KCNMA1 (also

known as KCa1.1) encodes the maxi-K channel in the acinar cells of parotid and

submandibular exocrine glands10. TRPC1, as a critical component of the store-operated Ca2+

channel in acinar cells, is essential for neurotransmitter-regulation of fluid secretion11. If a

gene crossed multiple windows, its θπ ratio, FST and |Tajima’s Ddomestic – Tajima’s DTibetan|

values were averaged over these overlapping windows. b, Box plot of θπ ratio, FST and

|Tajima’s Ddomestic – Tajima’s DTibetan| values for 9 genes in the ‘saliva secretion’ category of

Chinese domestic pigs versus the whole genome. Bootstrapping was performed by randomly

resampling 178,924 genes from the 9 genes. The statistical significance was calculated by the

Mann-Whitney U test.

Nature Genetics: doi:10.1038/ng.2811

 

32  

Supplementary Fig. 36. Vacuum chewing (Domestic Duroc pig). Vacuum chewing is

defined as oral activities with saliva, but no food in the mouth, which is accompanied by

copious production of saliva seen as ‘froth’ around the mouth: it is one of the most frequently

observed stereotypies in housed pigs in the pig industry.

Nature Genetics: doi:10.1038/ng.2811

 

33  

Supplementary Tables 1-8, 10-16, 18-22, 24-27 and 29-36

Supplementary Table 1. Genome sequencing strategy for the Tibetan wild boar.

Pair-end libraries

Insert size

Raw data (Gb)

High-quality data

Data (Gb)

Proportion of Q20 (%)

Proportion of Q30 (%)

Proportion of GC (%)

Read length

(bp)

Illumina reads

180 bp 136.57 130.05 96.80 91.42 39.45 101 500 bp 88.64 86.19 96.20 91.01 39.56 101 2 Kb 27.13 20.84 94.44 88.06 44.14 51/1015 Kb 33.72 13.08 95.58 90.62 43.78 101

10 Kb 33.23 28.07 96.71 91.16 45.84 75

In total 319.29 Gb of sequence data were obtained for de novo assembly. After filtering reads

based on quality, 278.23 Gb of high-quality data were retained for subsequent analysis.

Nature Genetics: doi:10.1038/ng.2811

 

34  

Supplementary Table 2. Estimation of the Tibetan wild boar genome size using K-mer analysis.

K mer K mer

number K mer depth

Genome size (Mb)

Revised genome size* (M)

Heterozygous rate (%)

Repetition rate (%)†

Used bases (Gb)

Sequence depth (×)

19 1.02E+11 41.94 2,427.87 2,379.31 0.85 38.86 128.4 53.97

The estimated size of the Tibetan wild boar genome is ~2.38 Gb.

* ‘Revised genome size’ is the accurate estimation without error K-mers.

† ‘Repetition rate’ is the proportion of the same K-mer fragments in all K-mers.

Supplementary Table 3. Summary of the Tibetan wild boar genome assembly.

Category Calculated using the fragments > 100 bp

Calculated using the fragments > 500 bp

Contigs Scaffolds Contigs Scaffolds

Total length (bp) 2,426,282,217 2,501,667,227 2,400,295,503 2,475,602,644

Max length (bp) 278,361 6,123,902 278,361 6,123,902

Average length (bp) 6,490 15,321 10,177 87,980

N50 length (bp) | Number 20,411 | 32,634 1,049,950 | 714 20,688 | 32,002 1,062,107 | 701

N60 length (bp) | Number 15,751 | 46,177 817,959 | 984 16,022 | 45,196 826,816 | 965

N70 length (bp) | Number 11,775 | 63,968 616,452 | 1,334 12,059 | 62,441 634,339 | 1,305

N80 length (bp) | Number 8,062 | 88,736 421,873 | 1,815 8,368 | 86,205 442,560 | 1,767

N90 length (bp) | Number 4,605 | 128,040 227,167| 2,599 4,942 | 123,139 247,789 | 2,501

Nature Genetics: doi:10.1038/ng.2811

 

35  

Supplementary Table 4. Summary of mapping and coverage depth.

Category Value

Average sequencing depth (×) 70.8 Mismatch rate (%) 0.5 Mapping rate (%) 90.3

Coverage (%) 98.7 Coverage at least 4 × (%) 98.0

Coverage at least 10 × (%) 97.0 Coverage at least 20 × (%) 94.8

To evaluate the single-base accuracy of the assembled Tibetan wild boar genome, the

high-quality short-insert reads (180 bp and 500 bp) were realigned onto the assembly

scaffolds. An average depth of 70.8 was obtained and approximately 94.8% of the

genome was covered by 20 or more reads.

Supplementary Table 5. Transposon element families in the Tibetan wild boar

genome based on various methods.

Type Repeat size (bp) % of genome

Proteinmask 202,408,765 8.25

Repeatmasker 903,922,135 36.85

Trf 37,346,250 1.52

De novo 605,241,890 24.68

Total 968,058,934 39.47

Transposable elements comprised ~39.47% of the Tibetan wild boar genome, which is

similar to the value obtained for the Duroc pig genome (40.55%).

Nature Genetics: doi:10.1038/ng.2811

 

36  

Supplementary Table 6. Transposon element families in the Tibetan wild boar genome based on homolog alignment.

Repeat type Repbase TEs TE proteins RepeatModeler Combined TEs*

Length (kb)

% in genomeLength

(kb) % in

genome

Length (kb)

% in genome

Length (kb)

% in genome

DNA transposon 62,355 2.54 4,350 0.18 23,551 0.96 63,921 2.61 LINE 416,309 16.97 190,852 7.78 202,588 8.26 442,644 18.05

LTR retrotransposon 110,510 4.51 7,227 0.29 66,794 2.72 120,730 4.92 SINE 320,011 13.05 0 0.00 310,469 12.66 336,061 13.70 Other† 5 0.00 0 0.00 0 0.00 5 0.00

Unknown‡ 880 0.04 0 0.00 0 0.00 880 0.04 Total 903,922 36.85 202,408 8.25 602,302 24.56 949,776 38.72

*Combined: the non-redundant consensus of all repeat prediction/classification methods employed.

†Other: the repeats classified by RepeatMasker, which are not included in the other groups;

‡Unknown: the predicted repeats that cannot be classified by RepeatMasker;

LINE, long interspersed nuclear elements; LTR, long terminal repeat; SINE, short interspersed nuclear elements.

Nature Genetics: doi:10.1038/ng.2811

 

37  

Supplementary Table 7. Summary of InDels in the Tibetan wild boar genome.

Category Number of InDels Upstream 6,571

CDS 982 Intron 291,414

Splicing 20 Downstream 6,790

Upstream/Downstream 82 Intergenic 678,425

Total 984,284

‘Upstream’ refers to a variant that overlaps with the 1 kb region upstream of the gene start

site. ‘Downstream’ refers to a variant that overlaps with the 1 kb region downstream of the

gene end site. ‘Upstream/Downstream’ indicates that a variant is located in downstream

and upstream regions (possibly for two different genes). ‘Splicing’ refers to a variant that is

within 2 bp of a splice junction.

Supplementary Table 8. Summary of syntenic regions between the Tibetan wild

boar and Duroc pig genomes.

Breed Scaffold / Genome

size* Aligned

nucleotides

Syntenic proportion (%)

Number of blocks†

Tibetan wild boar

2,501,667,227 bp (2.50 Gb)

2,336,696,950 bp (2.34 Gb)

93.41

37,544 Duroc pig‡

2,806,871,662 bp (2.81 Gb)

2,715,263,667 bp (2.72 Gb)

96.74

To detect synteny blocks between Tibetan wild boar and Duroc pig genomes, after repeat

masking, pairwise whole-genome alignment was performed using LASTZ with the

parameters T = 2 (no transition), Y (ydrop) = 15,000, L (gappedthresh) = 3,000 and K

(hspthresh) = 4,500 (Supplementary URLs). The raw alignments were combined into

larger blocks using the ChainNet algorithm. *The size of Scaffold/genome included the

gaps, i.e. ‘N’ (unidentified nucleotides), whose content in the Tibetan wild boar genome

(3.01%) is lower than that in the Duroc pig genome (10.31%). †Number of contiguous

syntenic blocks determined by pairwise comparisons between Tibetan wild boar and

Duroc pig genomes. ‡Excludes mitochondrial genome and Y chromosome.

Supplementary Table 9. List of inversion regions between the Tibetan wild boar and

Duroc pig genomes. (see Excel file ‘Supplementary Table 9.xls’)

Nature Genetics: doi:10.1038/ng.2811

 

38  

Supplementary Table 10. Summary of non-coding RNA distribution and annotation

in the Tibetan wild boar genome.

Type Number Average

length (bp) Total

length (bp)% of

genome miRNA 381 88 33,339 0.00136 tRNA 531 75 39,594 0.00161

rRNA

rRNA 304 114 34,507 0.00141 18S 26 226 5,886 0.00024 28S 118 139 16,418 0.00067

5.8S 4 96 383 0.00002 5S 156 76 11,820 0.00048

snRNA

snRNA 890 113 100,406 0.00409 CD-box 221 93 20,568 0.00084

HACA-box 189 138 26,107 0.00106 splicing 458 111 50,865 0.00207

microRNA (miRNA), small nuclear RNA (snRNA) and tRNA located in repeat or gap

regions were filtered. rRNA (< 50bp) with identity less than 85% were also filtered. The

average length and total length were calculated using the integrated data.

Nature Genetics: doi:10.1038/ng.2811

 

39  

Supplementary Table 11. Characteristics of the Tibetan wild boar and Duroc pig

genome assemblies.

Genomic features Tibetan

wild boar Duroc pig*

Assembled genome size (Gb)† 2.43 2.52 Number of N (unidentified nucleotides) 75,385,010 289,538,800 N content of whole genome (%) 3.01 10.31 Number of Contigs 370,587 73,524 (placed) | 168,358 (unplaced)Contig N50 (bp) ‡ 20,688 69,669 Average contig length (bp) 10,177 11,611 Largest contig length (bp) 278,361 1,598,650 Number of Scaffolds 163,276 5,343 (placed) | 4,562 (unplaced) Scaffold N50 (bp) ‡ 1,062,107 576,008 Average scaffold length (bp) 87,980 283,544 Largest scaffold length (bp) 6,123,902 3,862,550 GC content (%) 41.82 41.70 Number of base A 705,040,222 733,853,103 % of genome base A 29.06 29.13 Number of base T 706,487,877 734,661,583 % of genome base T 28.12 29.16 Number of base C 507,683,217 525,183,301 % of genome base C 20.92 20.85 Number of base G 507,070,901 525,289,361 % of genome base G 20.90 20.85 Repeat rate (%) 39.47 40.55 Number of putative coding genes 21,806 21,640 Number of exons 188,336 197,675 Average gene model length (bp) 32,117 26,781 Average CDS length (bp) 1,582 1,370 Average gene exon length (bp) 183 162 Average exon number per gene 8.64 8.44 Average gene intron length (bp) 3,998 3,444 Number of miRNA 381 374 Number of tRNA 531 819 Number of rRNA 304 185 Number of snRNA 890 1,030

* From Groenen et al. (2012)7.

† The fragments of the ungapped genome assembly.

‡ N50 (50% of the genome is in fragments of this length or longer) of genome assembly

was calculated using the fragments longer than 500 bp.

Nature Genetics: doi:10.1038/ng.2811

 

40  

Supplementary Table 12. Summary of RNA-seq mapping results

Tissue Read types Mapping to the Tibetan wild boar genome Mapping to the Duroc pig genome

Number of reads % of reads Number of reads % of reads

Heart

Total reads 104,723,266 104,723,266 Mapped reads 83,979,755 80.19 74,893,632 71.52

Multiple- | Uniquely- mapped reads 3,937,595 | 80,042,160 3.76 | 76.43 6,220,562 | 68,673,070 5.94 | 65.58 Read-1 | Read-2 39,047,371 | 37,853,776 37.29 | 36.15 36,532,082 | 35,287,352 34.88 | 33.70

Reads map to '+' | to '-' 38,711,826 | 38,189,321 36.97 | 36.47 35,852,834 | 35,966,600 34.24 | 34.34 Non-splice reads | Splice reads 58,162,158 | 18,738,989 55.54 | 17.89 49,640,490 | 22,178,944 47.40 | 21.18

Kidney

Total reads 30,460,082 30,460,082 Mapped reads 22,830,732 74.95 22,669,607 74.42

Multiple- | Uniquely- mapped reads 763,398 | 22,067,334 2.51 | 72.45 2,162,136 | 20,507,471 7.10 | 67.33 Read-1 | Read-2 11,134,500 | 10,932,834 36.55 | 35.89 10,346,021 | 10,161,450 33.97 | 33.36

Reads map to '+' | to '-' 11,040,124 | 11,027,210 36.24 | 36.20 10,292,010 | 10,215,461 33.79 | 33.54 Non-splice reads | Splice reads 15,959,027 | 6,108,307 52.39 | 20.05 15,390,368 | 5,117,103 50.53 | 16.80

Liver

Total reads 20,257,918 20,257,918 Mapped reads 14,757,764 72.85 14,200,850 70.10

Multiple- | Uniquely- mapped reads 523,069 | 14,234,695 2.58 | 70.27 1,811,792 | 12,389,058 8.94 | 61.16 Read-1 | Read-2 7,173,634 | 7,061,061 35.41 | 34.86 6,244,772 | 6,144,286 30.83 | 30.33

Reads map to '+' | to '-' 7,132,602 | 7,102,093 35.21 | 35.06 6,202,752 | 6,186,306 30.62 | 30.54 Non-splice reads | Splice reads 9,488,360 | 4,746,335 46.84 | 23.43 8,423,595 | 3,965,463 41.58 | 19.57

Lung

Total reads 35,255,828 35,255,828

Mapped reads 25,001,818 70.92 22684760 64.34 Multiple- | Uniquely- mapped reads 814,419 | 24,187,399 2.31 | 68.61 2,424,339 | 20,260,421 6.88 | 57.47

Read-1 | Read-2 12,301,199 | 11,886,200 34.89 | 33.71 10,311,043 | 9,949,378 29.25 | 28.22 Reads map to '+' | to '-' 12,109,760 | 12,077,639 34.35 | 34.26 10,143,933 | 10,116,488 28.77 | 28.69

Non-splice reads | Splice reads 16,876,361 | 7,311,038 47.87 | 20.74 14,324,210 | 5,936,211 40.63 | 16.84

RNA-seq reads were aligned to the Tibetan wild boar and Duroc pig genomes using TopHat (v2.0.7) with default parameters. ‘Splice reads’ refers to

reads where part of the read was not mapped contiguously to the reference genome. The mapping rate of RNA-seq reads against the Tibetan wild boar

genome (74.73%) is higher than against the Duroc pig genome (70.10%) across four Tibetan wild boar tissues. Out of 21,806 predicted protein-coding

genes in the Tibetan wild boar genome, 18,366 (84.23%) show evidence of transcription based on RNA-seq.

Nature Genetics: doi:10.1038/ng.2811

 

41  

Supplementary Table 13. Summary of evidence for the EVidenceModeler (EVM)

gene models in the Tibetan wild boar genome.

Category ≥20% overlap ≥50% overlap ≥80% overlap

Number % of total

Number% of total

Number % of total

P (single) 34 0.14 463 1.84 2,439 9.69 P (more) 1,789 7.11 2,328 9.25 3,145 12.49 H (single) 18 0.07 27 0.11 101 0.40 H (more) 5 0.02 58 0.23 530 2.11 C (single) 1 0.00 2 0.01 80 0.32 C (more) 0 0.00 4 0.02 37 0.15

P + H 12 0.05 136 0.54 849 3.37 P + C 402 1.60 888 3.53 1,290 5.12 H + C 5,569 22.12 6,584 26.15 6,575 26.11

P + H + C 17,347 68.90 14,677 58.29 9,642 38.30

P, ab initio prediction; H, homology-based; C, cDNA/EST/ transcript expressed genes.

Genes were further separated into “single” and “more” categories based on the number of

sources supporting their existence.

Supplementary Table 14. Assessment of sequence coverage of the Tibetan wild

boar genome assembly using the CDS regions of the Duroc pig genome.

Length of unigene

Number Total length

(bp)

Covered by the draft

genome (%)

with >90% sequence in one scaffold

with >50% sequence in one

scaffold Number % Number %

All 21,619 29,614,875 99.94 19,567 90.51 21,277 98.42>200 bp 21,276 29,558,865 99.95 19,258 90.51 20,938 98.41>500 bp 17,710 28,275,129 99.95 15,927 89.93 17,394 98.22

>1,000 bp 10,926 23,033,892 99.96 9,876 90.39 10,816 98.99

The CDS sequences of the Duroc pig genome were downloaded from Ensembl release

67, and mapped to the Tibetan wild boar genome assembly. Out of 21,806 predicted

protein-coding genes in the Tibetan wild boar genome, 21,619 (99.94%) were covered by

CDS regions of the Duroc pig genome.

Nature Genetics: doi:10.1038/ng.2811

 

42  

Supplementary Table 15. Summary of predicted protein-coding genes in the Tibetan

wild boar genome compared with other representative mammalian genomes.

Gene set Number Average

gene model length (bp)

Average CDS

length (bp)

Average exons

number per gene

Average exon length

(bp)

Average intron length

(bp)

Tibetan wild boar

21,806 32,117 1,582 8.64 183 3,998

Duroc pig 21,619 26,987 1,370 8.44 162 3,444 Human 20,207 49,011 1,580 9.31 169 5,708 Cattle 19,970 35,523 1,598 9.59 167 3,949 Dog 19,281 30,994 1,577 9.90 160 3,305

Mouse 22,838 36,688 1,516 8.56 177 4,651

Genes with alternative splicing-induced premature termination and defective codon

events were not considered.

Supplementary Table 16. Number of Tibetan wild boar genes with functional

classification by various methods.

Category Number Percent (%)

Total 21,806 100

Annotated (20,157 genes,

92.44%)

Swissprot 19,754 90.59 TrEMBL 20,128 92.30 KEGG 14,297 65.56

InterPro 16,137 74.00 GO 12,888 59.10

Unannotated 1,649 7.56

Out of 21,806 predicted protein-coding genes in the Tibetan wild boar genome, 20,157

(92.44%) have protein homologues in the other mammalian genomes.

Supplementary Table 17. Tibetan wild boar-specific genes with evidence of

transcription. (see Excel file ‘Supplementary Table 17.xls’)

Nature Genetics: doi:10.1038/ng.2811

 

43  

Supplementary Table 18. Functional gene categories enriched for the Tibetan wild

boar- and Duroc pig-specific families.

Functional category

Term ID Term description P values Involved

gene number

Tibetan wild boar

GO-MF GO:0003964 RNA-directed DNA polymerase activity 0.00E+00 507

GO-BP GO:0006278 RNA-dependent DNA replication 0.00E+00 507

GO-BP GO:0006260 DNA replication 0.00E+00 508

InterProScan IPR004244 Transposase, L1 0.00E+00 253

GO-MF GO:0016779 Nucleotidyltransferase activity 0.00E+00 509

InterProScan IPR005135 Endonuclease/exonuclease/phosphatase 3.18E-278 206

GO-BP GO:0090304 Nucleic acid metabolic process 8.81E-255 571

InterProScan IPR003036 Core shell protein Gag P30 4.44E-13 21

KEGG-pathway map05130 Pathogenic Escherichia coli infection 8.54E-11 17

KEGG-pathway map04270 Vascular smooth muscle contraction 2.07E-09 23

KEGG-pathway map04810 Regulation of actin cytoskeleton 2.93E-09 20

KEGG-pathway map04350 TGF-beta signaling pathway 4.52E-09 19

KEGG-pathway map04670 Leukocyte transendothelial migration 4.52E-09 19

KEGG-pathway map04062 Chemokine signaling pathway 7.15E-09 20

InterProScan IPR004875 DDE superfamily endonuclease, CENP-B-like

1.08E-04 13

InterProScan IPR001063 Ribosomal protein L22/L17 1.25E-02 6

InterProScan IPR003308 Integrase, N-terminal zinc-binding domain

1.25E-02 4

GO-BP GO:0015074 DNA integration 2.03E-02 4

GO-MF GO:0004523 Ribonuclease H activity 2.77E-02 3

KEGG-pathway map04150 mTOR signaling pathway 3.43E-02 6

KEGG-pathway map04010 MAPK signaling pathway 3.91E-02 14

KEGG-pathway map04914 Progesterone-mediated oocyte maturation

3.99E-02 8

Duroc pig KEGG-pathway ssc04740 Olfactory transduction 1.53E-04 35

InterProScan IPR009311 Interferon-induced 6-16 6.78E-03 8

GO-BP GO:0006508 Proteolysis 3.08E-02 8

GO-BP GO:0051605 Protein maturation by peptide bond cleavage

4.27E-02 3

GO-BP GO:0016485 Protein processing 4.27E-02 3

GO-BP GO:0051604 Protein maturation 4.27E-02 3

GO-MF GO:0008233 Peptidase activity 4.38E-02 7

InterProScan IPR011360 Complement B/C2 4.68E-02 4

P values (i.e. EASE scores), indicating significance of the overlap between various gene

sets, were calculated using a Benjamini-corrected modified Fisher’s exact test. Only

GO-BP (biological process), GO-MF (molecular function), KEGG-pathway and InterPro

domain terms with a P value less than 0.05 were considered as significant and listed.

Nature Genetics: doi:10.1038/ng.2811

 

44  

Supplementary Table 19. Summary of gene families in six mammals.

Tibetan wild boar

Duroc pig Human Cattle Dog Mouse

Number of genes* 19,444 19,753 17,558 19,767 18,742 17,592

Number of gene families 16,203 16,356 15,506 17,401 16,935 10,907

Number of genes per family 1.20 1.21 1.13 1.14 1.11 1.61 Number of linage-specific genes

1,264 271 536 39 49 3,473

Number of linage-specific gene families

189 124 191 9 18 1,036

* Excludes mitochondrial genes and unclustered genes. Similar to the Duroc pig (number

of genes per families: 1.21, lineage-specific gene families: 124) and human (1.13 and

191), the Tibetan wild boar (1.20 and 189) exhibited a moderate rate of evolution relative

to other mammals, which is higher than the rate in cattle (1.14 and 9) and in dog (1.11 and

18), but lower than in mouse (1.61 and 1,036).

Nature Genetics: doi:10.1038/ng.2811

 

45  

Supplementary Table 20. Functional gene categories enriched for the Tibetan wild

boar- and Duroc pig-specific expansion families.

Functional category

Term ID Term description P values Involved

gene number

Tibetan wild boar

InterProScan IPR008331 Ferritin/DPS protein domain 8.64E-13 9

InterProScan IPR009040 Ferritin- like diiron domain 8.64E-13 9

GO-MF GO:0008199 Ferric iron binding 7.18E-12 9

KEGG-pathway map05130 Pathogenic Escherichia coli infection 8.48E-06 6

InterProScan IPR002190 MAGE protein 1.14E-05 6

GO-MF GO:0016705 Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen

1.47E-05 4

KEGG-pathway map04270 Vascular smooth muscle contraction 5.71E-05 6

KEGG-pathway map04350 TGF-beta signaling pathway 1.94E-04 6

KEGG-pathway map04670 Leukocyte transendothelial migration 1.94E-04 6

KEGG-pathway map00601 Glycosphingolipid biosynthesis - lacto and neolacto series

5.46E-04 4

KEGG-pathway map04310 Wnt signaling pathway 5.50E-04 6

KEGG-pathway map04810 Regulation of actin cytoskeleton 1.97E-03 6

KEGG-pathway map04062 Chemokine signaling pathway 2.46E-03 6

InterProScan IPR007087 Zinc finger, C2H2 3.89E-03 17

InterProScan IPR015880 Zinc finger, C2H2-like 1.05E-02 16

KEGG-pathway map00980 Metabolism of xenobiotics by cytochrome P450 1.16E-02 4 Duroc pig

KEGG-pathway ssc04740 Olfactory transduction 8.46E-23 30

InterProScan IPR001039 MHC class I, alpha chain, alpha1 and alpha2 8.50E-03 5

GO-MF GO:0046872 Metal ion binding 1.62E-02 6

GO-MF GO:0043169 Cation binding 1.73E-02 6

InterProScan IPR011161 MHC class I-like antigen recognition 1.73E-02 7

GO-MF GO:0043167 Ion binding 1.77E-02 5

InterProScan IPR003006 Immunoglobulin/major histocompatibility complex, conserved site

2.68E-02 5

InterProScan IPR003597 Immunoglobulin C1-set 3.03E-02 6

There are 92 families (390 genes) and 232 families (950 genes) that were substantially

expanded in the Tibetan wild boar and Duroc pig compared to other mammals,

respectively.

Nature Genetics: doi:10.1038/ng.2811

 

46  

Supplementary Table 21. Positively selected genes (PSGs) identified in the Tibetan

wild boar and Duroc pig genomes.

ID Gene

symbol Gene name P value

Tibetan wild boar

1 ABLIM1 Actin binding LIM protein 1 1.97E-05

2 ACR Acrosin 2.58E-14

3 ACTR5 ARP5 actin-related protein 5 homolog (yeast) 3.55E-14

4 ACVR1B Activin A receptor, type IB 0.00E+00

5 ADAMTS15 ADAM metallopeptidase with thrombospondin type 1 motif, 15

4.06E-14

6 ADAMTS9 ADAM metallopeptidase with thrombospondin type 1 motif, 9

5.46E-14

7 ADAMTSL3 ADAMTS-like 3 6.46E-14

8 ADCY1 Adenylate cyclase 1 (brain) 0.00E+00

9 ADCY2 Adenylate cyclase 2 (brain) 0.00E+00

10 ADCY4 Adenylate cyclase 4 1.33E-06

11 ADORA2B Adenosine A2b receptor 7.33E-09

12 ADRA1B Adrenergic, alpha-1B-, receptor 9.14E-14

13 AEBP1 AE binding protein 1 9.87E-14

14 AGA Aspartylglucosaminidase 1.11E-06

15 AKTIP AKT interacting protein; similar to AKT interacting protein

0.00E+00

16 ALDH2 Aldehyde dehydrogenase 2 family (mitochondrial)

1.42E-10

17 ALPK2 Alpha-kinase 2 1.41E-13

18 ANKAR Ankyrin and armadillo repeat containing 0.00E+00

19 ANKRD27 Ankyrin repeat domain 27 (VPS9 domain) 1.57E-13

20 ANO5 Anoctamin 5 1.67E-13

21 ANTXR2 Anthrax toxin receptor 2 1.97E-13

22 AP4E1 Adaptor-related protein complex 4, epsilon 1 subunit

2.13E-13

23 APIP APAF1 interacting protein; similar to APAF1 interacting protein

0.00E+00

24 APOBEC1 Apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1

2.17E-13

25 APOE Hypothetical LOC100129500; apolipoprotein E

5.19E-07

26 ARAP3 ArfGAP with RhoGAP domain, ankyrin repeat and PH domain 3

2.49E-13

27 ARG2 Arginase, type II 3.51E-13

28 ARHGEF11 Rho guanine nucleotide exchange factor (GEF) 11

2.17E-05

29 ARHGEF12 Rho guanine nucleotide exchange factor (GEF) 12

0.00E+00

30 ARNT Aryl hydrocarbon receptor nuclear translocator

0.00E+00

31 ASNSD1 Asparagine synthetase domain containing 1 3.95E-13

32 ASTL Astacin-like metallo-endopeptidase (M12 family)

2.45E-07

Nature Genetics: doi:10.1038/ng.2811

 

47  

33 ATAD2 ATPase family, AAA domain containing 2 0.00E+00

34 ATXN7 Ataxin 7 1.39E-10

35 BBS7 Bardet-Biedl syndrome 7 0.00E+00

36 BCL3 B-cell CLL/lymphoma 3 4.65E-11

37 BIRC2 Baculoviral IAP repeat-containing 2 4.09E-13

38 C8B Complement component 8, beta polypeptide 4.31E-13

39 C8ORF76 Chromosome 8 open reading frame 76 4.49E-13

40 CA6 Carbonic anhydrase VI 4.65E-13

41 CA9 Carbonic anhydrase IX 5.26E-13

42 CABLES2 Cdk5 and Abl enzyme substrate 2 5.28E-13

43 CALCRL Calcitonin receptor-like 5.48E-03

44 CAMK2G Calcium/calmodulin-dependent protein kinase II gamma

3.33E-16

45 CBL Cas-Br-M (murine) ecotropic retroviral transforming sequence

5.28E-13

46 CCHCR1 Coiled-coil alpha-helical rod protein 1 5.30E-13

47 CCNE2 Cyclin E2 5.70E-13

48 CDK12 Cdc2-related kinase, arginine/serine-rich 0.00E+00

49 CELF5 Bruno-like 5, RNA binding protein (Drosophila)

1.01E-10

50 CHD3 Chromodomain helicase DNA binding protein 3

1.06E-10

51 COL11A1 Collagen, type XI, alpha 1 0.00E+00

52 COL14A1 Collagen, type XIV, alpha 1 5.78E-13

53 COPZ2 Coatomer protein complex, subunit zeta 2 4.55E-10

54 CPEB4 Cytoplasmic polyadenylation element binding protein 4

0.00E+00

55 CPXM2 Carboxypeptidase X (M14 family), member 2 6.52E-13

56 CTSZ Cathepsin Z 1.48E-10

57 DGAT1 Diacylglycerol O-acyltransferase homolog 1 (mouse)

6.61E-08

58 DGUOK Deoxyguanosine kinase 1.31E-08

59 DNAJC7 DnaJ (Hsp40) homolog, subfamily C, member 7

6.20E-09

60 DPP4 Dipeptidyl-peptidase 4 7.47E-13

61 DPYSL4 Dihydropyrimidinase-like 4 7.75E-13

62 DPYSL5 Dihydropyrimidinase-like 5 8.99E-11

63 DUSP3 Dual specificity phosphatase 3 1.93E-08

64 EBPL Emopamil binding protein-like 7.92E-13

65 EEA1 Early endosome antigen 1 8.08E-13

66 EGLN2 Egl nine homolog 2 (C. elegans) 8.74E-13

67 EIF4E1B Eukaryotic translation initiation factor 4E family member 1B

1.99E-10

68 EIF4E2 Eukaryotic translation initiation factor 4E family member 2

2.69E-06

69 ERCC4 Excision repair cross-complementing rodent repair deficiency, complementation group 4

5.07E-07

70 ERCC6 Excision repair cross-complementing rodent repair deficiency, complementation group 6

1.01E-12

71 EREG Epiregulin 3.13E-09

Nature Genetics: doi:10.1038/ng.2811

 

48  

72 ERGIC1 Endoplasmic reticulum-golgi intermediate compartment (ERGIC) 1

1.50E-07

73 ESCO1 Establishment of cohesion 1 homolog 1 (S. cerevisiae)

1.11E-16

74 ETFA Electron-transfer-flavoprotein, alpha polypeptide

2.12E-08

75 FABP2 Fatty acid binding protein 2, intestinal 4.19E-08

76 FBXL4 F-box and leucine-rich repeat protein 4 0.00E+00

77 FBXO30 F-box protein 30 5.55E-16

78 FGF10 Fibroblast growth factor 10 1.05E-12

79 FIGF C-fos induced growth factor (vascular endothelial growth factor D)

1.35E-12

80 FLAD1 FAD1 flavin adenine dinucleotide synthetase homolog (S. cerevisiae)

0.00E+00

81 FNBP1 Formin binding protein 1 2.49E-10

82 FNBP1L Formin binding protein 1-like 3.76E-10

83 FOXL2 Forkhead box L2 6.66E-16

84 GHRHR Growth hormone releasing hormone receptor 1.36E-12

85 GIN1 Gypsy retrotransposon integrase 1 5.65E-11

86 GPD2 Glycerol-3-phosphate dehydrogenase 2 (mitochondrial)

0.00E+00

87 GPR182 G protein-coupled receptor 182 1.56E-12

88 GRAMD1C GRAM domain containing 1C 1.74E-12

89 GRIA2 Glutamate receptor, ionotropic, AMPA 2 2.13E-12

90 GTPBP8 GTP-binding protein 8 (putative) 2.31E-12

91 GUF1 GUF1 GTPase homolog (S. cerevisiae) 2.67E-12

92 GUSB Glucuronidase, beta 3.53E-12

93 HELB Helicase (DNA) B 3.62E-12

94 HHAT Hedgehog acyltransferase 2.02E-06

95 HIF1A Hypoxia inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)

3.96E-12

96 HLTF Helicase-like transcription factor 2.22E-16

97 HMGCL 3-hydroxymethyl-3-methylglutaryl-Coenzyme A lyase

3.96E-12

98 HPS5 Hermansky-Pudlak syndrome 5 4.08E-12

99 HSF1 Heat shock transcription factor 1 4.09E-12

100 HSPA9 Heat shock 70kDa protein 9 (mortalin) 4.44E-16

101 ID2 Inhibitor of DNA binding 2, dominant negative helix-loop-helix protein

4.09E-12

102 IDH1 Isocitrate dehydrogenase 1 (NADP+), soluble 6.66E-16

103 IDH3G Isocitrate dehydrogenase 3 (NAD+) gamma 4.34E-12

104 IFIH1 Interferon induced with helicase C domain 1 4.58E-12

105 IFNG Interferon, gamma 4.86E-12

106 IGF1 Insulin-like growth factor 1 (somatomedin C) 0.00E+00

107 IGF2R Insulin-like growth factor 2 receptor 5.26E-12

108 IHH Indian hedgehog homolog (Drosophila) 5.97E-06

109 IL4I1 Interleukin 4 induced 1 5.11E-07

110 IL5RA Interleukin 5 receptor, alpha 7.07E-07

111 KCNA3 Potassium voltage-gated channel, shaker-related subfamily, member 3

6.61E-12

Nature Genetics: doi:10.1038/ng.2811

 

49  

112 KCNH4 Potassium voltage-gated channel, subfamily H (eag-related), member 4

6.93E-12

113 KLHL2 Kelch-like 2, Mayven (Drosophila) 0.00E+00

114 LDLRAP1 Low density lipoprotein receptor adaptor protein 1

6.27E-08

115 LEF1 Lymphoid enhancer-binding factor 1 2.47E-10

116 LEPR Leptin receptor 2.68E-07

117 LHX2 LIM homeobox 2 5.56E-10

118 LMTK2 Lemur tyrosine kinase 2 1.12E-07

119 LPCAT4 Lysophosphatidylcholine acyltransferase 4 4.06E-10

120 MAP1LC3C Microtubule-associated protein 1 light chain 3 gamma

3.85E-11

121 MAP2K2 Mitogen-activated protein kinase kinase 2 pseudogene; mitogen-activated protein kinase kinase 2

9.45E-13

122 MAPK8IP3 Mitogen-activated protein kinase 8 interacting protein 3

0.00E+00

123 MAPKAPK2 Mitogen-activated protein kinase-activated protein kinase 2

7.04E-12

124 MAT2A Methionine adenosyltransferase II, alpha 2.78E-06

125 MINPP1 Multiple inositol polyphosphate histidine phosphatase, 1

1.85E-03

126 MIXL1 Mix1 homeobox-like 1 (Xenopus laevis) 9.57E-06

127 MMP11 Matrix metallopeptidase 11 (stromelysin 3) 7.94E-12

128 MYO1H Myosin IH 2.19E-10

129 MYO5C Myosin VC 3.93E-07

130 MYT1L Myelin transcription factor 1-like 0.00E+00

131 NARS Asparaginyl-tRNA synthetase 0.00E+00

132 NDUFS2 NADH dehydrogenase (ubiquinone) Fe-S protein 2, 49kDa (NADH-coenzyme Q reductase)

5.22E-08

133 NPR1 natriuretic peptide receptor A/guanylate cyclase A (atrionatriuretic peptide receptor A)

3.87E-13

134 NPY1R Neuropeptide Y receptor Y1 3.31E-06

135 ODAM Odontogenic, ameloblast asssociated 4.71E-08

136 PAFAH2 Platelet-activating factor acetylhydrolase 2, 40kDa

0.00E+00

137 PAICS Phosphoribosylaminoimidazole carboxylase, phosphoribosylaminoimidazole succinocarboxamide synthetase

8.34E-12

138 PAK7 P21 protein (Cdc42/Rac)-activated kinase 7 8.57E-12

139 PANK3 Pantothenate kinase 3 1.07E-11

140 PCSK7 Proprotein convertase subtilisin/kexin type 7 pseudogene; proprotein convertase subtilisin/kexin type 7

7.89E-07

141 PDGFRA Platelet-derived growth factor receptor, alpha polypeptide

0.00E+00

142 PEX3 Peroxisomal biogenesis factor 3 6.66E-16

143 PGF Placental growth factor 4.64E-08

144 PIK3C2G Phosphoinositide-3-kinase, class 2, gamma polypeptide

1.20E-11

145 PIP5K1C Phosphatidylinositol-4-phosphate 5-kinase, type I, gamma

4.52E-07

Nature Genetics: doi:10.1038/ng.2811

 

50  

146 PLA2G2A phospholipase A2, group IIA (platelets, synovial fluid)

6.61E-03

147 PLAU Plasminogen activator, urokinase 3.33E-16

148 PLCB3 Phospholipase C, beta 3 (phosphatidylinositol-specific)

2.85E-05

149 PLCG1 Phospholipase C, gamma 1 0.00E+00

150 PLK3 Polo-like kinase 3 (Drosophila) 2.58E-07

151 PLOD2 Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2

0.00E+00

152 PMCH Pro-melanin-concentrating hormone 7.07E-11

153 PPA1 Pyrophosphatase (inorganic) 1 2.68E-08

154 PPID Peptidylprolyl isomerase D 0.00E+00

155 PPP1R12B Protein phosphatase 1, regulatory (inhibitor) subunit 12B

9.03E-08

156 PPP1R15B Protein phosphatase 1, regulatory (inhibitor) subunit 15B

8.61E-03

157 PRKAA2 Protein kinase, AMP-activated, alpha 2 catalytic subunit

2.63E-06

158 PRKACA Protein kinase, cAMP-dependent, catalytic, alpha

0.00E+00

159 PSMB6 Proteasome (prosome, macropain) subunit, beta type, 6

3.83E-09

160 PSMD9 Proteasome (prosome, macropain) 26S subunit, non-ATPase, 9

6.66E-16

161 PSME4 Proteasome (prosome, macropain) activator subunit 4

0.00E+00

162 PSPH Phosphoserine phosphatase-like; phosphoserine phosphatase

4.52E-11

163 PTGIR Prostaglandin I2 (prostacyclin) receptor (IP) 0.00E+00

164 PTPN1 Protein tyrosine phosphatase, non-receptor type 1

7.56E-10

165 PYGO1 Pygopus homolog 1 (Drosophila) 1.37E-10

166 RABEPK Rab9 effector protein with kelch motifs 1.90E-09

167 RAD51AP1 RAD51 associated protein 1 0.00E+00

168 RAMP1 Receptor (G protein-coupled) activity modifying protein 1

4.53E-09

169 RANBP3L RAN binding protein 3-like 0.00E+00

170 RAPGEF2 Rap guanine nucleotide exchange factor (GEF) 2; similar to RAPGEF2 protein

0.00E+00

171 RARS2 Arginyl-tRNA synthetase 2, mitochondrial 0.00E+00

172 REV1 REV1 homolog (S. cerevisiae) 0.00E+00

173 RICTOR RPTOR independent companion of MTOR, complex 2

1.78E-04

174 RIOK1 RIO kinase 1 (yeast) 0.00E+00

175 RNASET2 Ribonuclease T2 5.72E-08

176 RNF111 Ring finger protein 111 0.00E+00

177 RNF151 Ring finger protein 151 3.75E-06

178 RNF214 Ring finger protein 214 0.00E+00

179 RPS6KB2 Ribosomal protein S6 kinase, 70kDa, polypeptide 2

0.00E+00

180 RSPRY1 Ring finger and SPRY domain containing 1 6.10E-08

181 SDHAF2 Chromosome 11 open reading frame 79 5.17E-06

Nature Genetics: doi:10.1038/ng.2811

 

51  

182 SEC14L5 SEC14-like 5 (S. cerevisiae) 7.11E-15

183 SERGEF Secretion regulating guanine nucleotide exchange factor

4.11E-11

184 SERPINE1 Serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1

1.18E-05

185 SGTB Small glutamine-rich tetratricopeptide repeat (TPR)-containing, beta

9.88E-15

186 SHH Sonic hedgehog homolog (Drosophila) 1.33E-11

187 SP8 Sp8 transcription factor 9.99E-15

188 SPHK1 Sphingosine kinase 1 1.31E-07

189 SRGN Serglycin 3.33E-02

190 STX3 Syntaxin 3 4.27E-06

191 SYT13 Synaptotagmin XIII 7.91E-10

192 TBCD Tubulin folding cofactor D 2.30E-11

193 TDO2 Tryptophan 2,3-dioxygenase 2.45E-11

194 TDRD1 Tudor domain containing 1 2.49E-11

195 TGDS TDP-glucose 4,6-dehydratase 0.00E+00

196 TMED10 Transmembrane emp24-like trafficking protein 10 (yeast)

6.33E-09

197 TMTC4 Transmembrane and tetratricopeptide repeat containing 4

0.00E+00

198 TRIM37 Tripartite motif-containing 37 0.00E+00

199 TRIM44 Tripartite motif-containing 44 4.02E-11

200 TRNAU1AP tRNA selenocysteine 1 associated protein 1 3.83E-06

201 TRPM7 Transient receptor potential cation channel, subfamily M, member 7

0.00E+00

202 TTC13 Tetratricopeptide repeat domain 13 3.33E-16

203 TTC9 Tetratricopeptide repeat domain 9 9.46E-11

204 USF1 Upstream transcription factor 1 2.61E-11

205 VEGFC Vascular endothelial growth factor C 4.44E-16

206 WWP1 WW domain containing E3 ubiquitin protein ligase 1

2.61E-10

207 XRCC1 X-ray repair complementing defective repair in Chinese hamster cells 1

5.18E-10

208 ZC3H12D Zinc finger CCCH-type containing 12D 1.67E-14

209 ZNF451 Zinc finger protein 451 2.11E-08

210 ZNF558 Zinc finger protein 558 0.00E+00

211 ZNF567 Zinc finger protein 567 5.88E-09

212 ZNF606 Zinc finger protein 606 3.40E-06

213 ZNRF4 Zinc and ring finger 4 1.15E-06

214 ZPBP Zona pellucida binding protein 2.73E-11

215 ZRANB3 Zinc finger, RAN-binding domain containing 3 0.00E+00

Duroc pig

1 ABLIM1 Actin binding LIM protein 1 2.41E-03

2 ACVR1C Activin A receptor, type IC 0.00E+00

3 ADAMTS12 ADAM metallopeptidase with thrombospondin type 1 motif, 12

1.05E-12

4 ADCY1 Adenylate cyclase 1 (brain) 0.00E+00

5 ADCY4 Adenylate cyclase 4 0.00E+00

Nature Genetics: doi:10.1038/ng.2811

 

52  

6 ADRB3 Adrenergic, beta-3-, receptor 2.83E-04

7 AGA Aspartylglucosaminidase 3.78E-02

8 AGPAT2 1-acylglycerol-3-phosphate O-acyltransferase 2 (lysophosphatidic acid acyltransferase, beta)

2.48E-03

9 ALOX5 Arachidonate 5-lipoxygenase 2.37E-06

10 ALS2CL ALS2 C-terminal like 1.42E-12

11 ANLN Anillin, actin binding protein 3.77E-13

12 APBA1 Amyloid beta (A4) precursor protein-binding, family A, member 1

0.00E+00

13 APBA2 Amyloid beta (A4) precursor protein-binding, family A, member 2

6.17E-14

14 APOO Apolipoprotein O 1.91E-03

15 ARHGAP11ARho GTPase activating protein 11B; Rho GTPase activating protein 11A

1.83E-12

16 ARHGAP25 Rho GTPase activating protein 25 2.51E-12

17 B4GALNT1 Beta-1,4-N-acetyl-galactosaminyl transferase 1

4.78E-12

18 BARX2 BARX homeobox 2 2.33E-03

19 BTC Betacellulin 9.50E-12

20 BTG4 B-cell translocation gene 4 2.58E-03

21 BYSL Bystin-like 1.23E-11

22 C9ORF89 Chromosome 9 open reading frame 89 2.48E-03

23 CDC16 Cell division cycle 16 homolog (S. cerevisiae) 2.26E-03

24 CDC26 Cell division cycle 26 homolog (S. cerevisiae); cell division cycle 26 homolog (S. cerevisiae) pseudogene

7.06E-04

25 CDC45 CDC45 cell division cycle 45-like (S. cerevisiae)

2.42E-03

26 CDCA7L Cell division cycle associated 7-like 8.32E-04

27 CEP164 Centrosomal protein 164kDa 9.93E-06

28 CHKB Choline kinase beta; carnitine palmitoyltransferase 1B (muscle)

2.56E-11

29 CILP Cartilage intermediate layer protein, nucleotide pyrophosphohydrolase

2.37E-03

30 CLDN18 Claudin 18 2.38E-04

31 CNGA3 Cyclic nucleotide gated channel alpha 3 3.03E-11

32 CNTNAP5 Contactin associated protein-like 5 0.00E+00

33 COL11A1 Collagen, type XI, alpha 1 0.00E+00

34 COL17A1 Collagen, type XVII, alpha 1 4.38E-11

35 COL4A4 Collagen, type IV, alpha 4 8.77E-15

36 COL5A3 Collagen, type V, alpha 3 4.83E-03

37 COL6A2 Collagen, type VI, alpha 2 4.65E-11

38 CPT1B Choline kinase beta; carnitine palmitoyltransferase 1B (muscle)

7.97E-11

39 CRISPLD2 Cysteine-rich secretory protein LCCL domain containing 2

1.13E-10

40 CSF3R Colony stimulating factor 3 receptor (granulocyte)

1.18E-10

41 CXADR Coxsackie virus and adenovirus receptor pseudogene 2; coxsackie virus and adenovirus receptor

1.88E-10

Nature Genetics: doi:10.1038/ng.2811

 

53  

42 DNAJB5 DnaJ (Hsp40) homolog, subfamily B, member 5

0.00E+00

43 DSCAM Down syndrome cell adhesion molecule 0.00E+00

44 ELF3 E74-like factor 3 (ets domain transcription factor, epithelial-specific )

3.96E-06

45 EML4 Echinoderm microtubule associated protein like 4

0.00E+00

46 EMX2 Empty spiracles homeobox 2 1.80E-05

47 ENO2 Enolase 2 (gamma, neuronal) 2.00E-04

48 EVI5L Ecotropic viral integration site 5-like 2.70E-10

49 FANCD2 Fanconi anemia, complementation group D2 3.22E-10

50 FNDC3A Fibronectin type III domain containing 3A 8.78E-06

51 FREM2 FRAS1 related extracellular matrix protein 2 6.02E-14

52 GDF3 Growth differentiation factor 3 6.21E-04

53 GHSR Growth hormone secretagogue receptor 7.05E-14

54 GPLD1 Glycosylphosphatidylinositol specific phospholipase D1

2.00E-04

55 GRHPR Glyoxylate reductase/hydroxypyruvate reductase

4.54E-10

56 HIATL1 Hippocampus abundant transcript-like 1 1.06E-05

57 IGF2BP2 Insulin-like growth factor 2 mRNA binding protein 2

1.55E-03

58 IGFALS Insulin-like growth factor binding protein, acid labile subunit

1.64E-04

59 IGFBP2 Insulin-like growth factor binding protein 2, 36kDa

2.22E-03

60 IL6R Interleukin 6 receptor 5.32E-10

61 ITGA3 Integrin, alpha 3 (antigen CD49C, alpha 3 subunit of VLA-3 receptor)

2.00E-15

62 ITGA8 Integrin, alpha 8 8.53E-04

63 ITGB6 Integrin, beta 6 8.60E-03

64 JMY Junction mediating and regulatory protein, p53 cofactor

7.17E-10

65 JUNB Jun B proto-oncogene 1.11E-15

66 KCNT2 Potassium channel, subfamily T, member 2 1.92E-03

67 KEL Kell blood group, metallo-endopeptidase 9.19E-10

68 KLC1 Kinesin light chain 1 0.00E+00

69 KLHL2 Kelch-like 2, Mayven (Drosophila) 1.96E-03

70 LAMA4 Laminin, alpha 4 3.45E-03

71 LAMB3 Laminin, beta 3 1.45E-09

72 LCAT Lecithin-cholesterol acyltransferase 9.03E-04

73 LEF1 Lymphoid enhancer-binding factor 1 7.19E-04

74 LIMK2 LIM domain kinase 2 3.77E-13

75 LYN V-yes-1 Yamaguchi sarcoma viral related oncogene homolog

1.47E-09

76 LYST Lysosomal trafficking regulator 1.67E-09

77 MAPK8IP3 Mitogen-activated protein kinase 8 interacting protein 3

4.69E-05

78 MBTPS1 Membrane-bound transcription factor peptidase, site 1

1.71E-09

79 MCF2L MCF.2 cell line derived transforming 1.71E-09

Nature Genetics: doi:10.1038/ng.2811

 

54  

sequence-like

80 MCM4 Minichromosome maintenance complex component 4

3.06E-09

81 MEF2B Myocyte enhancer factor 2B 0.00E+00

82 MEF2C Myocyte enhancer factor 2C 2.38E-04

83 MGRN1 Mahogunin, ring finger 1 2.15E-04

84 MINPP1 Multiple inositol polyphosphate histidine phosphatase, 1

1.85E-03

85 MYBPC1 Myosin binding protein C, slow type 4.15E-09

86 MYH13 Myosin, heavy chain 13, skeletal muscle 0.00E+00

87 MYO10 Myosin X 2.36E-04

88 MYO18B Myosin XVIIIB 2.43E-13

89 MYO1D Myosin ID 0.00E+00

90 MYO1F Myosin IF 2.58E-03

91 NARS Asparaginyl-tRNA synthetase 5.16E-06

92 NCAPD3 Non-SMC condensin II complex, subunit D3 5.52E-09

93 NDE1 NudE nuclear distribution gene E homolog 1 (A. nidulans)

7.28E-09

94 NDRG1 N-myc downstream regulated 1 5.64E-14

95 NDUFB7 NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 7, 18kDa

1.39E-04

96 NFE2L2 Nuclear factor (erythroid-derived 2)-like 2 9.60E-09

97 NFKB2 Nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100)

1.38E-08

98 NIPBL Nipped-B homolog (Drosophila) 1.45E-08

99 NMUR2 Neuromedin U receptor 2 2.18E-04

100 NNT Nicotinamide nucleotide transhydrogenase 5.66E-15

101 NOTCH2 Notch homolog 2 (Drosophila) 1.52E-08

102 OSBPL7 Oxysterol binding protein-like 7 1.52E-08

103 PAG1 Phosphoprotein associated with glycosphingolipid microdomains 1

1.99E-08

104 PANX1 Pannexin 1 5.84E-04

105 PARVA Parvin, alpha 2.01E-08

106 PDGFC Platelet derived growth factor C 2.63E-03

107 PEX11G Peroxisomal biogenesis factor 11 gamma 1.78E-04

108 PGF Placental growth factor 1.06E-04

109 PIP5K1C Phosphatidylinositol-4-phosphate 5-kinase, type I, gamma

0.00E+00

110 PKHD1 Polycystic kidney and hepatic disease 1 (autosomal recessive)

2.62E-08

111 PLSCR1 Phospholipid scramblase 1 3.60E-13

112 PLXNC1 Plexin C1 3.61E-08

113 PNPO Pyridoxamine 5'-phosphate oxidase 4.22E-08

114 POSTN Periostin, osteoblast specific factor 6.10E-04

115 PPAP2B Phosphatidic acid phosphatase type 2B 7.54E-04

116 PPARGC1A Peroxisome proliferator-activated receptor gamma, coactivator 1 alpha

1.16E-05

117 PPFIBP1 PTPRF interacting protein, binding protein 1 (liprin beta 1)

3.80E-14

118 PPP1R15B Protein phosphatase 1, regulatory (inhibitor) 3.61E-05

Nature Genetics: doi:10.1038/ng.2811

 

55  

subunit 15B

119 PSAT1 Chromosome 8 open reading frame 62; phosphoserine aminotransferase 1

0.00E+00

120 PSMD5 Proteasome (prosome, macropain) 26S subunit, non-ATPase, 5

5.77E-15

121 PSRC1 Proline/serine-rich coiled-coil 1 6.48E-13

122 PTPRR Protein tyrosine phosphatase, receptor type, R

4.97E-08

123 QKI Quaking homolog, KH domain RNA binding (mouse)

5.40E-08

124 RAD51AP1 RAD51 associated protein 1 9.72E-03

125 RAP1GAP RAP1 GTPase activating protein 5.57E-08

126 RBL1 Retinoblastoma-like 1 (p107) 1.67E-15

127 RCC2 Regulator of chromosome condensation 2 2.53E-03

128 RECK Reversion-inducing-cysteine-rich protein with kazal motifs

0.00E+00

129 RELB V-rel reticuloendotheliosis viral oncogene homolog B

7.87E-13

130 RTN4 Reticulon 4 1.28E-03

131 SBNO2 Strawberry notch homolog 2 (Drosophila) 5.94E-08

132 SCARB1 Scavenger receptor class B, member 1 6.37E-08

133 SEMA5A

Sema domain, seven thrombospondin repeats (type 1 and type 1-like), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 5A

9.25E-08

134 SERPINE1 Serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 1

3.35E-06

135 SERPINF1 Serpin peptidase inhibitor, clade F (alpha-2 antiplasmin, pigment epithelium derived factor), member 1

1.03E-07

136 SESN1 Sestrin 1 7.95E-04

137 SESN3 Sestrin 3 0.00E+00

138 SGSM1 Small G protein signaling modulator 1 1.08E-07

139 SH3PXD2A SH3 and PX domains 2A 2.22E-16

140 SIPA1L2 Signal-induced proliferation-associated 1 like 2

3.08E-04

141 SLC15A5 Solute carrier family 15, member 5 2.05E-03

142 SLC16A14 Solute carrier family 16, member 14 (monocarboxylic acid transporter 14)

1.90E-05

143 SLC16A6 Solute carrier family 16, member 6 (monocarboxylic acid transporter 7); similar to solute carrier family 16, member 6

2.71E-04

144 SLC1A7 Solute carrier family 1 (glutamate transporter), member 7

2.44E-05

145 SLC27A1 Solute carrier family 27 (fatty acid transporter), member 1

1.09E-05

146 SLC2A2 Solute carrier family 2 (facilitated glucose transporter), member 2

2.27E-04

147 SLC6A14 Solute carrier family 6 (amino acid transporter), member 14

1.08E-07

148 SLC6A3 Solute carrier family 6 (neurotransmitter transporter, dopamine), member 3

0.00E+00

149 SNX32 Sorting nexin 32 1.67E-07

Nature Genetics: doi:10.1038/ng.2811

 

56  

150 SNX5 Sorting nexin 5 1.75E-07

151 SOS1 Son of sevenless homolog 1 (Drosophila) 2.05E-07

152 SPHK1 Sphingosine kinase 1 2.70E-03

153 SREBF2 Sterol regulatory element binding transcription factor 2

5.57E-07

154 SRGN Serglycin 1.02E-04

155 SYCE1L Hypothetical protein LOC100130958 1.03E-05

156 SYDE2 Synapse defective 1, Rho GTPase, homolog 2 (C. elegans)

0.00E+00

157 TBC1D13 TBC1 domain family, member 13 7.07E-07

158 TBC1D15 TBC1 domain family, member 15 2.17E-05

159 TBC1D2 TBC1 domain family, member 2 7.33E-07

160 TCF21 Transcription factor 21 7.79E-14

161 TFAP2A Transcription factor AP-2 alpha (activating enhancer binding protein 2 alpha)

8.40E-04

162 TFDP1 Transcription factor Dp-1 5.95E-14

163 TGFBI Transforming growth factor, beta-induced, 68kDa

9.00E-07

164 TGFBR3 Transforming growth factor, beta receptor III 1.36E-03

165 THBS4 Thrombospondin 4 2.54E-14

166 TNFRSF1B Tumor necrosis factor receptor superfamily, member 1B

1.19E-06

167 TNN Tenascin N 5.53E-14

168 TOM1L1 Target of myb1 (chicken)-like 1 1.23E-06

169 TRHDE Thyrotropin-releasing hormone degrading enzyme

0.00E+00

170 TRHR Thyrotropin-releasing hormone receptor 8.92E-13

171 TRPV1 Transient receptor potential cation channel, subfamily V, member 1

5.58E-05

172 TSTA3 Tissue specific transplantation antigen P35B 1.24E-06

173 TYR Tyrosinase-like (pseudogene); tyrosinase (oculocutaneous albinism IA)

2.79E-13

174 UBR1 Ubiquitin protein ligase E3 component n-recognin 1

0.00E+00

175 UGGT1 UDP-glucose ceramide glucosyltransferase-like 1

5.77E-15

176 UGGT2 UDP-glucose ceramide glucosyltransferase-like 2

2.11E-06

177 USH1C Usher syndrome 1C (autosomal recessive, severe)

7.50E-04

178 USHBP1 Usher syndrome 1C binding protein 1 2.23E-06

179 VPS16 Vacuolar protein sorting 16 homolog A (S. cerevisiae)

1.01E-04

180 WWP2 WW domain containing E3 ubiquitin protein ligase 2

9.10E-15

181 ZBTB40 Zinc finger and BTB domain containing 40 0.00E+00

182 ZWILCH Zwilch, kinetochore associated, homolog (Drosophila)

0.00E+00

In total, 215 and 182 PSGs were identified for the Tibetan wild boar and Duroc pig,

respectively, using the likelihood ratio test (LRT) based on the branch-site model (P <

0.05).

Nature Genetics: doi:10.1038/ng.2811

 

57  

Supplementary Table 22. Functional gene categories enriched for the 215 PSGs in

the Tibetan wild boar and 182 PSGs in the Duroc pig.

Functional category

Term ID Term description Involved

gene number

P values

Tibetan wild boar KEGG-pathway hsa04270 Vascular smooth muscle contraction 16 9.66E-07

GO-BP GO:0070482 Response to oxygen levels 15 1.85E-05

KEGG-pathway hsa04150 mTOR signaling pathway 10 6.39E-05

GO-BP GO:0001666 Response to hypoxia 13 3.40E-04

GO-MF GO:0030554 Adenyl nucleotide binding 42 1.25E-03

GO-BP GO:0032870 Cellular response to hormone stimulus 10 1.27E-03

GO-MF GO:0032559 Adenyl ribonucleotide binding 41 1.28E-03

GO-BP GO:0031331 Positive regulation of cellular catabolic process

6 1.40E-03

GO-BP GO:0048514 Blood vessel morphogenesis 12 1.49E-03

GO-BP GO:0031329 Regulation of cellular catabolic process 7 1.51E-03

GO-BP GO:0001525 Angiogenesis 10 1.53E-03

GO-BP GO:0009725 Response to hormone stimulus 19 1.75E-03

GO-BP GO:0045761 Regulation of adenylate cyclase activity 8 2.48E-03

GO-BP GO:0009894 Regulation of catabolic process 8 2.48E-03

GO-BP GO:0051240 Positive regulation of multicellular organismal process

15 2.53E-03

GO-BP GO:0030817 Regulation of cAMP biosynthetic process 8 3.15E-03

GO-BP GO:0051339 Regulation of lyase activity 8 3.15E-03

KEGG-pathway hsa04020 Calcium signaling pathway 12 3.38E-03

GO-BP GO:0001568 Blood vessel development 12 3.65E-03

GO-BP GO:0030808 Regulation of nucleotide biosynthetic process

10 5.35E-03

GO-BP GO:0030802 Regulation of cyclic nucleotide biosynthetic process

10 5.35E-03

GO-BP GO:0006140 Regulation of nucleotide metabolic process

10 5.95E-03

GO-BP GO:0001944 Vasculature development 12 1.98E-02

GO-MF GO:0032555 Purine ribonucleotide binding 44 2.09E-02

GO-MF GO:0003684 Damaged DNA binding 4 2.42E-02

InterProScan IPR001126 DNA-repair protein, UmuC-like 2 4.00E-02

GO-BP GO:0045740 Positive regulation of DNA replication 3 4.28E-02

GO-BP GO:0043085 Positive regulation of catalytic activity 18 4.70E-02

GO-BP GO:0006468 Protein amino acid phosphorylation 21 4.80E-02

GO-BP GO:0022610 Biological adhesion 33 2.09E-07

Duroc pig

GO-BP GO:0007155 Cell adhesion 33 4.04E-07

KEGG-pathway hsa04512 ECM-receptor interaction 11 2.17E-05

KEGG-pathway hsa04510 Focal adhesion 16 2.53E-05

GO-BP GO:0002021 Response to dietary excess 5 1.76E-04

GO-BP GO:0022402 Cell cycle process 19 3.33E-04

Nature Genetics: doi:10.1038/ng.2811

 

58  

GO-BP GO:0010033 Response to organic substance 22 3.54E-04

GO-MF GO:0008047 Enzyme activator activity 13 4.25E-04

GO-MF GO:0005099 Ras GTPase activator activity 7 5.75E-04

InterProScan IPR001609 Myosin head, motor region 5 5.89E-04

GO-BP GO:0048285 Organelle fission 11 7.26E-04

GO-BP GO:0010876 Lipid localization 9 9.24E-04

GO-MF GO:0003779 Actin binding 12 1.20E-03

GO-BP GO:0040008 Regulation of growth 13 1.46E-03

GO-MF GO:0030695 GTPase regulator activity 13 2.14E-03

GO-BP GO:0002274 Myeloid leukocyte activation 5 2.77E-03

GO-BP GO:0032483 Regulation of Rab protein signal transduction

5 3.00E-03

GO-BP GO:0050873 Brown fat cell differentiation 4 3.41E-03

GO-BP GO:0030198 Extracellular matrix organization 10 3.85E-03

GO-BP GO:0042493 Response to drug 9 6.63E-03

GO-BP GO:0043567 Regulation of insulin-like growth factor receptor signaling pathway

3 6.84E-03

GO-BP GO:0002263 Cell activation during immune response 4 1.08E-02

GO-BP GO:0002366 Leukocyte activation during immune response

4 1.08E-02

GO-BP GO:0006869 Lipid transport 7 1.08E-02

GO-MF GO:0005096 GTPase activator activity 12 1.58E-02

GO-BP GO:0045444 Fat cell differentiation 4 3.02E-02

GO-BP GO:0007049 Cell cycle 24 3.71E-02

GO-BP GO:0040014 Regulation of multicellular organism growth

7 3.92E-02

Supplementary Table 23. List of KA/KS (ω) for functional gene categories in Tibetan

wild boar and Duroc pig. The mean of ω in Tibetan wild boar and Duroc pig by GO-MF,

GO-BP terms and KEGG pathways are provided for genes that are significantly enriched

(P < 0.05, Benjamini-corrected modified Fisher’s exact test). The fold change in mean ω

between Tibetan wild boar versus Duroc pig that are > 2 or < 0.5 are marked in bold.

(see Excel file ‘Supplementary Table 23.xls’)

Nature Genetics: doi:10.1038/ng.2811

 

59  

Supplementary Table 24. List of a priori functional candidate genes related to ‘response to hypoxia’, ‘response to UV’ and ‘energy

metabolism’.

Response to hypoxia (122 genes)* ABAT ATP1B1 CXCR4 ENG HSD11B2 L1CAM PDGFA PLOD1 SOCS5 UBQLN1

ACVR1B BCL2 CYB5R4 EP300 HSP90B1 LATS1 PDGFB PLOD2 SOD1 UCP3

ADM BIRC2 CYP17A1 EPAS1 IFNG LRRC3B PDGFRA PML SOD3 USF1

ADORA1 BNIP3 CYP1A2 EPHX2 IL10 MMP2 PDIA2 PSME2 TDO2 VAV3

ADORA2A C1QTNF7 CYP2E1 ERCC3 INSR NAGLU PDLIM1 PYGM TGFB1 XRCC1

ADORA2B CA9 CYP2F1 FANCA ITGA1 NARFL PGF RORC TGFB2

AGTR1 CAMK2D CYP2U1 FLT1 ITGA2 NPR1 PIK3C2A RPS6KA1 TGFB3

ALDH2 CAPN2 DDAH1 FRMD6 ITPR1 OR6Y1 PIK3C2B RYR1 TICAM1

ALG12 CENPM DISC1 GPR182 JAG2 OTX1 PIK3C2G RYR2 TMEM206

ANGPT1 CFTR DPP4 GUCY1A3 JAK2 OXTR PIK3CB SCNN1G TNF

APOE CHMP4B EGFR HBE1 KATNA1 P2RX3 PIK3R1 SHH TRH

ARG2 CHRNB2 EGLN1 HIF1A KCNA5 P2RX4 PIK3R2 SMAD4 TXN

ARNT CLDN3 EGLN2 HMOX2 KCNJ8 PDE5A PLAU SOCS3 TXN2

Response to UV (38 genes)†

AURKB BRCA2 CDKN2D ERCC5 IL12A MME POLD1 TIPIN USP28 XPC

BAK1 CASP9 EGFR ERCC6 IL12B MYC REV1 TP73 USP47 ZRANB3

BCL2 CAT ERCC3 FEN1 MC1R PIK3R1 RUVBL2 USF1 WRN

BCL3 CCND1 ERCC4 HUS1 MEN1 PML SPRTN USP1 XPA

Energy metabolism (151 genes)‡

ABCA7 APOA4 CHM FAIM2 GYS1 LEPR NHLH2 PPARG SERPINE1 TXNIP

ABCC8 APOA5 CPE FANCL HEXB LIPE NMUR2 PPARGC1A SFRP1 UBR1

ACACB APOC3 CPEB4 FASN HSD11B1 LMNA NPY PPARGC1B SLC2A2 UCP2

Nature Genetics: doi:10.1038/ng.2811

 

60  

ACP1 APOE CPT1A FGF21 HSD11B2 LRPAP1 NPY1R PPP1R3A SLC6A1 UCP3

ACVR1C AQP7 CRH FOXA2 HTR1B MAGEL2 NPY2R PPY SLC6A14 VSX1

ADAMTS9 ARID5B CYB5R4 GAD2 IDE MAOA NPY5R PRKAA2 SLC6A3 WT1

ADRA1B ATP1B1 DBH GAMT IDH1 MC3R NR0B2 PRKAR1A SNRPN ZNF608

ADRA2A BBS2 DGAT1 GDF3 IFRD1 MC4R PCSK1 PROX1 SOAT2

ADRA2B BBS4 DHCR24 GHRHR IGF1 MC5R PCSK1N PTPN1 SOCS3

ADRB3 BBS7 DLK1 GHSR IL15 MED12 PGD PTTG1 SREBF1

AEBP1 BRS3 DPT GIPR IL6R MEN1 PHF6 RASGRF1 TBX3

AGPAT2 BSCL2 EIF4EBP1 GNPDA2 INSR MEST PIK3R1 RETN TGFB1

AGRP CBL ENPP1 GPAM IRS1 MKKS PLA2G1B RSC1A1 TMEM160

AMACR CCKAR EREG GPC4 KCNA3 MMP11 PLSCR1 RSPO3 TNF

ANGPTL6 CEBPA FABP1 GPD2 KEL MYC PMCH SCARB1 TNFRSF1B

APOA2 CEBPD FABP2 GSK3B LEP NCOA3 PNMT SDC3 TRPV1

* A total of 122 functional candidate genes related to ‘response to hypoxia’ are merged from the reports of Beall et al. (2010)12, Bigham et al. (2010)13,

Simonson et al. (2010)14, Yi et al. (2010)15, Peng et al. (2011)16, Xu et al. (2011)17 , Ji et al. (2012)18 and Scheinfeldt et al. (2012)19.

† A total of 38 functional candidate genes related to ‘response to UV’ were listed from the GO-Biological Process category of ‘response to UV’ (GO

0009411), which represents process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme

production, gene expression, etc.) as a result of an ultraviolet radiation (UV light) stimulus.

‡ A total of 151 functional candidate genes related to ‘energy metabolism’ are merged from the reports of Rankinen et al. (2006)20, MacDougald et al.

(2007)21, Heid et al. (2010)22, Speliotes et al. (2010)23 and Li et al. (2012)24, which are mainly involved in energy homeostasis, muscle growth and

adipose deposition, as well as adipokines, myokines, neurokines and hormones in regulating food intake.

Only the functional candidate genes which are also included in the 7,917 single-copy orthologs shared with Tibetan wild boar, Duroc pig and human are

listed.

Nature Genetics: doi:10.1038/ng.2811

 

61  

Supplementary Table 25. Functional candidate genes related to ‘response to hypoxia’ under positive selection in the Tibetan wild boar (21

PSGs) and Duroc pig (1 PSG).

Gene symbol

Gene name ω

(Tibetan) P value

(Tibetan) ω

(Duroc) P value (Duroc)

ACVR1B Activin A receptor, type IB 0.385 0.00E+00 0.000 6.87E-01

ALDH2 Aldehyde dehydrogenase 2 family (mitochondrial) 0.627 1.42E-10 0.219 9.98E-01

APOE Apolipoprotein E 0.296 5.19E-07 0.216 9.99E-01

ARG2 Arginase, type II 0.593 3.51E-13 0.107 9.81E-01

ARNT Aryl hydrocarbon receptor nuclear translocator 0.852 0.00E+00 0.033 6.27E-01

BIRC2 Baculoviral IAP repeat-containing 2 0.383 4.09E-13 0.326 9.83E-01

CA9 Carbonic anhydrase IX 0.685 5.26E-13 0.091 9.88E-01

DPP4 Dipeptidyl-peptidase 4 0.093 7.47E-13 0.065 9.91E-01

EGLN2 Egl nine homolog 2 0.537 8.74E-13 0.100 9.91E-01

GPR182 G protein-coupled receptor 182 0.554 1.56E-12 0.218 9.93E-01

HIF1A Hypoxia inducible factor 1, alpha subunit 0.636 3.96E-12 0.313 9.94E-01

IFNG Interferon, gamma 0.768 4.86E-12 0.115 9.95E-01

PDGFRA Platelet-derived growth factor receptor, alpha polypeptide 0.422 0.00E+00 0.569 7.52E-02

PGF Placental growth factor 0.813 4.64E-08 0.778 1.06E-04

PIK3C2G Phosphoinositide-3-kinase, class 2, gamma polypeptide 1.006 1.20E-11 0.026 9.96E-01

PLAU Plasminogen activator, urokinase 0.612 3.33E-16 0.143 7.30E-01

PLOD2 Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 2 0.703 0.00E+00 0.085 5.95E-02

SHH Sonic hedgehog homolog 0.366 1.33E-11 0.047 9.96E-01

TDO2 Tryptophan 2,3-dioxygenase 0.935 2.45E-11 0.029 9.97E-01

Nature Genetics: doi:10.1038/ng.2811

 

62  

USF1 Upstream transcription factor 1 1.105 2.61E-11 0.150 9.97E-01

XRCC1 X-ray repair complementing defective repair in Chinese hamster cells 1

0.708 5.18E-10 0.260 9.98E-01

The ω ratio of non-synonymous to synonymous substitutions (i.e. KA/KS) was calculated by the PAML package25 for the Tibetan wild boar and Duroc pig,

taking the human ortholog as an outgroup. The P value was determined using the likelihood ratio test (LRT) based on the branch-site model. The P

values less than 0.05 are shown in bold.

Nature Genetics: doi:10.1038/ng.2811

 

63  

Supplementary Table 26. Functional candidate genes related to ‘response to UV’ under positive selection in the Tibetan wild boar (6 PSGs).

Gene symbol

Gene name ω

(Tibetan)P value

(Tibetan)ω

(Duroc)P value (Duroc)

Functional description

BCL3 B-cell CLL/lymphoma 3 0.584 4.65E-11 0.110 9.98E-01 UV-induced BCL3 activation directly suppressed the activity of epigenetic factor CTCF which is a master keeper of global chromatin structure26,27.

ERCC4

Excision repair cross complementing rodent repair deficiency, complementation group 4

0.521 5.07E-07 0.000 9.99E-01 ERCC4 is a specific endonuclease in DNA cross-linking repair, its hypomorphic mutations cause the UV-sensitive disorder xeroderma pigmentosum28,29.

ERCC6

Excision repair cross complementing rodent repair deficiency, complementation group 6

0.764 1.01E-12 0.149 9.93E-01 ERCC6, a DNA-binding protein, which is important in transcription-coupled excision repair and involved in preferential repair of active genes30.

REV1 REV1 homolog 1.104 0.00E+00 0.150 5.00E-01 REV1 is essential for the induction of mutations through replication processes that directly copy the damaged DNA template during DNA replication31,32.

USF1 Upstream transcription factor 1

1.105 2.61E-11 0.150 9.97E-01

UV-activated USF-1 could directly upregulated a variety of pigmentation genes implicated in protection from UV radiation33,34 (particularly MC1R, a major determinant of coat color variation in mammals35, including pig36).

ZRANB3 Zinc finger, RAN-binding domain containing 3

0.870 0.00E+00 0.000 4.13E-01 ZRANB3 maintains genomic stability by facilitating fork restart and limiting inappropriate recombination37,38.

The ω ratio of non-synonymous to synonymous substitutions (i.e. KA/KS) was calculated by the PAML package25 for the Tibetan wild boar and Duroc pig,

taking the human ortholog as an outgroup. The P value was determined using the likelihood ratio test (LRT) based on the branch-site model. The P

values less than 0.05 are shown in bold.

Nature Genetics: doi:10.1038/ng.2811

 

64  

Supplementary Table 27. Functional candidate genes related to ‘energy metabolism’ under positive selection in the Tibetan wild boar (17

PSGs) and Duroc pig (21 PSGs).

Gene symbol

Gene name ω

(Tibetan)P value

(Tibetan)ω

(Duroc)P value (Duroc)

Functional description

ACVR1C Activin A receptor, type IC 0.221 6.83E-01 0.627 0.00E+00

ACVR1C (also known as ALK7) is a type I receptor for the TGFB family of signaling molecules. Growth/differentiation factor 3 regulates adipose-tissue homeostasis and energy balance under nutrient overload in part by signaling through the ALK7 receptor39.

ADRB3 Adrenergic, beta-3-, receptor 0.091 1.00E+00 0.361 2.83E-04ADRB3 is a member of the adrenergic receptor group of G-protein-coupled receptors, which is involved in the regulation of lipolysis and thermogenesis40,41.

AGPAT2 1-acylglycerol-3-phosphate O-acyltransferase 2

0.000 1.00E+00 0.133 2.48E-03

AGPAT2 is a key intermediate in the biosynthesis of triacylglycerol and glycerophospholipids, which catalyzes the acylation of lysophosphatidic acid to form phosphatidic acid42,43.

GDF3 Growth differentiation factor 3 0.286 7.18E-01 0.534 6.21E-04

GDF3 is a member of the TGFβ superfamily, which regulates adipose-tissue homeostasis and energy balance under nutrient overload in part by signaling through the ALK7 receptor.39,44

GHSR Growth hormone secretagogue receptor

0.096 8.47E-01 0.408 7.05E-14

GHSR is a component of the ghrelin signaling pathway and is involved in mediating the pleiotropic effects of ghrelin, which play a role in energy homeostasis and regulation of body weight 45,46

IL6R Interleukin 6 receptor 0.392 9.90E-01 0.799 5.32E-10IL6R is a key mediator of inflammatory response, which is also involved in the modulation of metabolic traits and the etiology of metabolic syndrome 47,48

KEL Kell blood group, metallo-endopeptidase

0.623 9.91E-01 1.175 9.19E-10KEL is a type II transmembrane glycoprotein that is the highly polymorphic Kell blood group antigen49.

NMUR2 Neuromedin U receptor 2 0.247 1.00E+00 0.377 2.18E-04NMUR2 is a receptor for neuromedin U, which is widely distributed in the gut and central nervous system and plays

Nature Genetics: doi:10.1038/ng.2811

 

65  

an important role in the regulation of food intake and body weight50,51.

PLSCR1 Phospholipid scramblase 1 0.170 9.69E-01 0.773 3.60E-13

PLSCR1 is a member of PLSCR gene family, which plays a central role in receptor signaling and transactivation and contributes to cytokine-regulated cell proliferation and differentiation, and appears to influence the lipid accumulation and the risk for acquiring the metabolic syndrome52.

PPARGC1A

Peroxisome proliferator-activated receptor gamma, coactivator 1 alpha

0.001 6.24E-01 0.636 1.16E-05

PPARGC1A is a transcriptional coactivator which interacts with PPARγ and regulates muscle fiber type determination, cellular cholesterol homoeostasis and the development of obesity53,54.

SCARB1 Scavenger receptor class B, member 1

0.219 9.95E-01 0.537 6.37E-08

SCARB1 is a plasma membrane receptor for high density lipoprotein cholesterol (HDL), which is involved in the regulation of plasma HDL levels through reverse cholesterol transport, cardioprotection, steroidogenesis, and reproduction55,56.

SLC2A2 Solute carrier family 2, member 2

0.702 6.04E-01 1.270 2.27E-04SLC2A2 is an integral plasma membrane glycoprotein which mediates facilitated bidirectional glucose transport and influences serum HDL57.

SLC6A14 Solute carrier family 6, member 14

0.000 9.95E-01 0.867 1.08E-07

SLC6A14 is a member of the solute carrier family 6 which potentially regulates tryptophan availability for serotonin synthesis and thus possibly affects appetite control. Mutations in this gene may be associated with X-linked obesity58,59.

SLC6A3 Solute carrier family 6, member 3

0.215 4.28E-01 0.272 0.00E+00

SLC6A3 is a dopamine transporter. The polymorphisms involving a variable number of tandem repeats in the 3' UTR of SLC6A3 are associated with idiopathic epilepsy, dependence on alcohol and cocaine, and obesity in smokers60,61,

TNFRSF1B

Tumor necrosis factor receptor superfamily, member 1B

0.415 9.96E-01 0.478 1.19E-06TNFRSF1B is a member of the TNF-receptor superfamily, which is associated with obesity-induced peripheral neuropathy, hypertension and inflammation, and has been

Nature Genetics: doi:10.1038/ng.2811

 

66  

termed as a major contributing factor of type 2 diabetes62,63.

TRPV1 Transient receptor potential cation channel, subfamily V, member 1

0.130 9.97E-01 0.434 5.58E-05TRPV1 is an ion channel which is highly expressed on sensory nerve fibers innervating the pancreas and involved in the regulation of energy and fat metabolism64-66.

UBR1 Ubiquitin protein ligase E3 component n-recognin 1

0.767 4.71E-01 0.686 0.00E+00UBR1 is a component of the N-end rule pathway. UBR1-induced degradation of the low-density lipoprotein (LDL) receptor is essential for clearing circulating LDL67,68.

ADAMTS9 ADAM metallopeptidase with thrombospondin type 1 motif, 9

0.298 5.46E-14 0.400 9.72E-01ADAMTS9, an endogenous angiogenesis inhibitor, controls organ shape during development69,70.

ADRA1B Adrenergic, alpha-1B-, receptor

0.279 9.14E-14 0.000 9.74E-01ADRA1B, an α-adrenergic receptor, is required for normal postnatal growth of cardiac myocytes71.

AEBP1 AE binding protein 1 0.365 9.87E-14 0.257 9.75E-01AEBP1, a transcriptional repressor, positively regulates the enhancement of adipocyte proliferation and reduction of adipocyte differentiation72.

APOE Apolipoprotein E 0.296 5.19E-07 0.216 9.99E-01APOE, a transport apolipoprotein, is essential for lipoprotein metabolism and cardiovascular disease73,74.

BBS7 Bardet-Biedl syndrome 7 0.773 0.00E+00 0.000 7.21E-01

BBS7 is a member of the BBSome complex which is required for ciliogenesis. Mutations in this gene are associated with Bardet-Biedl syndrome75, which is characterized principally by obesity, retinitis pigmentosa, polydactyly, and hypogonadism76,77.

CBL Cas-Br-M ecotropic retroviral transforming sequence

0.288 5.28E-13 0.987 9.88E-01

CBL accepts ubiquitin from specific E2 ubiquitin conjugating enzymes, and transfers it to substrates, which regulate various cellular signaling events, including the insulin/insulin-like growth factor 1 and epidermal growth factor pathways78-80.

CPEB4 Cytoplasmic polyadenylation element binding protein 4

0.688 0.00E+00 0.000 6.40E-01

CPEB4 is a sequence-specific RNA-binding protein that promotes polyadenylation-induced translation in oocytes and neurons81 and is related to the modulation of body fat distribution22.

DGAT1 Diacylglycerol O-acyltransferase homolog 1

1.381 6.61E-08 0.253 9.99E-01DGAT1 catalyzes the linkage of a sn-1,2-diacylglycerol with a fatty acyl CoA to form a triglyceride molecule82. Mice

Nature Genetics: doi:10.1038/ng.2811

 

67  

lacking DGAT1 have increased energy expenditure and insulin sensitivity and are protected against dietinduced obesity and glucose intolerance83.

EREG Epiregulin 0.688 3.13E-09 0.096 6.77E-01EREG is a member of the epidermal growth factor family, which is related to weight loss with dextran sulfate sodium exposure84.

FABP2 Fatty acid binding protein 2, intestinal

1.367 4.19E-08 0.075 9.99E-01FABP2 is a lipid sensor in triglyceride-rich lipoprotein synthesis that maintains energy homeostasis85,86.

GHRHR Growth hormone releasing hormone receptor

0.636 1.36E-12 0.195 9.93E-01GHRHR is a receptor for growth hormone-releasing hormone, which stimulates somatotroph cell growth, synthesis and release of growth hormone87,88.

GPD2 Glycerol-3-phosphate dehydrogenase 2

0.632 0.00E+00 0.542 4.34E-01

GPD2 catalyzes conversion of glycerol-3-phosphate to dihydroxyacetone phosphate, and is a very important enzyme of the integration of glycolysis, oxidative phosphorylation and fatty acid metabolism89.

IDH1 Isocitrate dehydrogenase 1 (NADP+), soluble

0.916 6.66E-16 0.000 8.33E-01

IDH1 catalyzes the oxidative decarboxylation of isocitrate to 2-oxoglutarat. The presence of IDH1 in peroxisomes suggests roles in the regeneration of NADPH for intraperoxisomal reductions90,91

IGF1 Insulin-like growth factor 1 0.671 0.00E+00 0.385 6.86E-01IGF1, a hormone similar to insulin,has been recognized as a major determinant of body size in mammals 92,93.

KCNA3 Potassium voltage-gated channel, shaker-related subfamily, member 3

0.430 6.61E-12 0.162 9.95E-01

KCNA3 (also known as Kv1.3) is a subunit of a heteromeric potassium channel and considered a therapeutic target for the treatment of obesity and for enhancing peripheral insulin sensitivity in patients with type-2 diabetes mellitus94,95.

LEPR Leptin receptor 1.177 2.68E-07 0.290 9.99E-01LEPR, a major receptor for the well-known adipocyte-specific hormone leptin96,97.

MMP11 Matrix metallopeptidase 11 0.449 7.94E-12 0.250 9.96E-01

MMP11 (also known as stromelysin 3) is a member of the matrix metalloproteinase family, which negatively regulates adipogenesis by reducing pre-adipocyte differentiation and reversing mature adipocyte differentiation65,66.

NPY1R Neuropeptide Y receptor Y1 0.000 3.31E-06 0.000 1.00E+00 NPY1R is one of the most abundant neuropeptides in the

Nature Genetics: doi:10.1038/ng.2811

 

68  

mammalian nervous system and is associated with effects on food intake and regulation of central endocrine secretion 98,99.

PMCH Pro-melanin-concentrating hormone

0.406 7.07E-11 0.494 9.98E-01PMCH is a cyclic neuropeptide that plays an important role in energy homeostasis and a number of neuronal functions such as food intake 100,101.

PRKAA2 Protein kinase, AMP-activated, alpha 2 catalytic subunit

0.204 2.63E-06 0.074 1.00E+00PRKAA2, a monitor of cellular energy status, is necessary for maintaining myocardial energy homeostasis during ischemia102,103.

PTPN1 Protein tyrosine phosphatase, non-receptor type 1

0.687 7.56E-10 0.117 9.98E-01PTPN1 is a negative regulator of insulin and leptin signaling that modulates glucose homeostasis and energy expenditure 104,105.

The ω ratio of non-synonymous to synonymous substitutions (i.e. KA/KS) was calculated by the PAML package25 for the Tibetan and Duroc pigs, taking

the human ortholog as an outgroup. The P value was determined using the likelihood ratio test (LRT) based on the branch-site model. The P values

less than 0.05 are shown in bold.

Supplementary Table 28. Tibetan wild boar pseudogenes. A total of 188 pseudogenes containing 137 frameshift and 60 premature termination

events were identified in the Tibetan wild boar genome based on the use of in silico filters and further manual examination. (see Excel file

“Supplementary Table 28.xls”)

Nature Genetics: doi:10.1038/ng.2811

 

69  

Supplementary Table 29. Functional gene categories enriched for Tibetan wild boar pseudogenes.

Functional category

Term ID Term description Involved gene

number P

values Gene symbol

GO-BP GO:0042493 Response to drug 6 0.013 CAV2, BCHE, LCK, SMPD1, DDIT3, HTR2A

GO-MF GO:0042169 SH2 domain binding 3 0.027 SQSTM1, LCK, CRK

GO-MF GO:0019900 Kinase binding 5 0.042 CAV2, SQSTM1, LCK, AXIN2, RPS3

GO-BP GO:0008219 Cell death 11 0.045 TMEM85, SQSTM1, ARHGEF18, LCK, RYBP, CGB7, AXIN2, BCL2L12, C3ORF38, RPS3, HTR2A

GO-BP GO:0016265 Death 11 0.047 TMEM85, SQSTM1, ARHGEF18, LCK, RYBP, CGB7, AXIN2, BCL2L12, C3ORF38, RPS3, HTR2A

Nature Genetics: doi:10.1038/ng.2811

 

70  

Supplementary Table 30. Drug response genes that that appear inactive in the Tibetan wild boar genome.

Gene symbol

Gene name Inactivation

event ω0

(average)ω1

(other) ω2

(Tibetan)Functional description Related disease

BCHE Butyrylcholinesterase Frameshift 0.208 0.208 1.048 BCHE encodes a non-specific cholinesterase enzyme that hydrolyses many different choline esters106-108.

Delayed metabolism of succinylcholine, mivacurium, procaine, and cocaine / Postanesthetic apnea / Organophosphate toxicity / Alzheimer's disease drug hypersensitivity / Post succinylcholine apnea / Dementia

CAV2 Caveolin 2 Premature stop codon

0.405 0.374 ∞

CAV2 is a major component of the inner surface of caveolae, small invaginations of the plasma membrane, and is involved in essential cellular functions, including signal transduction, lipid metabolism, cellular growth control and apoptosis109,110.

Disturbance of cholesterol binding drug / Prostate cancer/ Breast cancer / Pulmonary dysfunction / Esophageal and bladder carcinomas

DDIT3 DNA damage inducible

transcript 3 Frameshift 0.125 0.125 1.394

DDIT3 is a member of the C/EBP family of transcription factors, which are implicated in adipogenesis and erythropoiesis, and is activated by endoplasmic reticulum stress and promotes apoptosis 111,112.

Myxoid liposarcoma / Ewing sarcoma / Myeloid leukemia

HTR2A 5 hydroxytryptamine

(serotonin) receptor 2A Frameshift 0.181 0.139 1.791

HTR2A encodes one of the receptors for 5-hydroxytryptamine (serotonin), a biogenic hormone that functions as a neurotransmitter, a hormone, and a mitogen113,114.

Dependence of alcohol, nicotine, heroin and cotinine / Schizophrenia / Anorexia nervosa / Obsessive compulsive disorder / Citalopram induced depressive disorder/Seasonal affective disorder / Weight gain, antipsychotic drug induced / Depression drug hypersensitivity / Antidepressant medication intolerance

LCK Lymphocyte specific

protein tyrosine kinase Frameshift 0.032 0.031 0.137

LCK is a member of the Src familyof protein tyrosine kinases which play an important role in the selection and maturation of developing T-cells115,116.

Severe combined immunodeficiency / Type 1 diabetes / Alzheimer's disease

SMPD1 Sphingomyelin

phosphodiesterase 1, acid lysosomal

Frameshift 0.084 0.082 ∞ SMPD1 encodes a lysosomal acid sphingomyelinase that converts sphingomyelin to ceramide117,118.

Niemann-Pick disease type A and B (also known as acid sphingomyelinase deficiency)

Note: ‘∞’indicates that there is no synonymous mutation has been identified in this gene. The nonsynonymous to synonymous substitution ratio (KA/KS, i.e. ω) was estimated for Duroc pig, human and Tibetan wild boar sequences using the Codeml program with the free-ratio model as implemented in the PAML package25. ω0 is the average ratio in all branches, ω1 is the average ratio in human and Duroc pig branches, and ω2 is the ratio in the Tibetan wild boar branch.

Nature Genetics: doi:10.1038/ng.2811

 

71  

Supplementary Table 31. Summary and mapping statistics of sampled pig populations/breeds.

Pig Population/

Breed Location

Latitude, longitude, average

altitude (m) Individual

PE length (bp)

Raw base (Gb)

High-quality rate (%)

Mapping rate (%)

Depth (×)

Coverage at least 1

× (%)

Coverage at least 4

× (%)

Tibetan wild boar (female)

Ganzi Ganzi Tibetan autonomous prefecture, Sichuan province, China

30.05ºN, 100.30ºE, 3,774m

1 101 12.18 98.47 91.76 4.41 94.4 60.1 2 100 12.18 99.8 91.14 4.41 95.63 61.04 3 100 10.66 99.8 91.75 3.88 93.91 52.06 4 100 14.32 99.81 91.82 5.22 96.26 71.49 5 100 14.26 99.77 91.51 5.18 96.43 70.91

Diqing Diqing Tibetan autonomous prefecture, Yunnan province, China

27.82ºN, 99.70ºE, 3,281m

1 100 16.08 99.79 91.98 5.79 96.43 75.54 2 101 12.3 99.03 91.21 4.45 95.59 61.59 3 101 11.99 98.27 91.33 4.31 95.00 58.96 4 100 17.66 99.75 92.85 6.54 96.83 80.30 5 101 11.74 99.20 92.57 4.35 94.73 58.41

Nyingchi Nyingchi prefecture, Tibetan autonomous region, China

29.65ºN, 93.98ºE, 3,526m

1 100 9.93 98.50 91.91 3.24 89.76 39.89 2 100 19.08 99.81 91.79 6.96 97.01 83.14 3 100 13.43 99.81 91.68 4.86 94.74 64.20 4 100 12.18 99.78 92.09 4.41 93.98 58.65 5 100 17.91 99.76 92.63 6.56 96.04 78.00

Shigatse Shigatse prefecture, Tibetan autonomous region, China

29.27ºN, 89.60ºE, 4,023m

1 100 14.74 99.75 92.07 5.36 94.31 67.09 2 100 11.51 99.77 91.69 4.20 92.47 54.85 3 100 15.09 99.76 91.74 5.41 94.70 67.73 4 100 12.44 99.72 92.50 4.58 94.36 61.02 5 100 14.90 99.75 92.46 5.45 95.15 68.73

Gannan Gannan Tibetan autonomous prefecture, Gansu province, China

34.98ºN, 102.91ºE, 2,881m

1 100 15.60 99.76 92.32 5.72 95.78 72.42 2 100 12.07 99.75 92.85 4.42 92.85 58.63 3 100 12.98 99.70 91.86 4.68 93.18 59.88 4 101 12.70 98.30 91.21 4.58 95.13 63.14 5 101 11.81 98.89 91.19 4.26 93.66 57.75

Nature Genetics: doi:10.1038/ng.2811

 

72  

A'ba A'ba Tibetan autonomous prefecture, Sichuan province, China

31.54ºN,102.96ºE, 3,441m

1 100 11.50 99.73 92.28 4.19 93.57 56.90 2 100 18.63 99.75 92.86 6.84 96.47 81.10 3 100 14.49 99.74 92.16 5.29 95.15 69.36 4 100 18.58 99.69 92.48 6.38 95.45 76.79 5 100 15.14 99.65 92.26 5.36 94.43 68.25

Chinese domestic

pig (female)

Penzhou Luzhou city, Sichuan province, China

30.65ºN, 105.81ºE, 515m

1 101 12.05 98.17 93.27 4.42 94.15 60.24 2 101 12.02 98.46 93.29 4.41 92.32 57.68 3 100 14.10 99.74 91.33 5.08 95.75 68.91

Wujin Liangshan Yi autonomous prefecture, Sichuan province, China

27.88ºN, 103.55ºE, 541m

1 100 15.94 99.65 90.73 5.60 95.37 72.13 2 100 14.27 99.66 92.88 5.12 93.9 67.00 3 100 12.11 99.23 92.59 4.38 93.94 59.24

Ya'nan Chengdu city, Sichuan province, China

30.65ºN, 103.46ºE, 504m

1 100 12.15 99.71 91.6 4.37 93.92 58.27 2 101 11.18 99.16 91.39 4.11 94.15 56.39 3 101 13.30 98.36 92.99 4.92 94.93 66.92

Neijiang Neijiang city, Sichuan province, China

30.65ºN, 105.06ºE, 335m

1 100 15.80 99.56 91.58 5.09 94.22 66.25 2 100 17.31 99.79 91.25 6.02 94.89 71.22 3 101 11.52 99.11 92.41 4.25 92.92 56.50

Jinhua Jinhua city, Zhejiang province, China

30.27ºN, 119.65ºE, 42m

1 101 11.68 99.37 93.31 4.39 94.64 60.62 2 100 12.42 99.8 93.33 4.62 93.77 60.56 3 100 10.62 99.85 92.60 3.90 93.34 51.01

Wild boar

(female) Wild boar Southwest China

29.56ºN, 109.87ºE, 368m

1 100 12.13 98.98 88.69 4.17 93.84 56.12 2 100 16.36 99.64 91.58 5.78 96.38 76.69 3 100 16.35 99.62 90.88 5.70 96.22 74.54

Nature Genetics: doi:10.1038/ng.2811

 

73  

Supplementary Table 32. Summary and mapping statistics of the downloaded pig genome re-sequencing data.

Breed Pig name Land of origin Individual High-quality base (Gb)*

Mapping rate (%)

Depth (×)Coverage at least 1

× (%)

Coverage at least 4

× (%)

Accession No.

Domestic pig

Duroc Denmark, North

American

1 21.01 97.41 5.95 81.79 68.52 ERS177302

2 22.69 97.95 6.96 81.34 69.86 ERS177303

3 11.74 97.96 4.56 80.21 59.00 ERS177304

4 14.76 98.04 5.77 80.68 64.31 ERS177305

Hampshire England, North

American 1 22.51 98.00 6.77 81.88 71.31 ERS177306

2 19.72 97.54 6.09 81.42 66.08 ERS177307

Jiangquhai Jiangsu province,

China 1 20.50 98.14 8.09 81.34 71.05 ERS177311

Landrace Denmark

1 18.34 98.21 7.21 81.24 69.40 ERS177312

2 27.01 97.59 7.99 82.12 74.29 ERS177313

3 17.56 97.56 5.32 80.99 63.21 ERS177314

4 14.48 98.07 5.64 81.12 66.54 ERS177315

5 14.87 98.03 5.86 81.25 68.16 ERS177316

Large White England

1 10.89 97.20 4.33 77.25 51.83 ERS177317

2 19.98 98.04 7.55 82.29 74.48 ERS177318

3 19.98 98.09 7.57 82.19 74.33 ERS177319

4 19.96 98.13 7.68 82.28 74.52 ERS177320

5 18.47 97.90 7.06 82.15 73.42 ERS177321

6 22.72 97.90 6.58 81.65 70.65 ERS177322

7 18.57 98.15 7.20 81.58 68.93 ERS177323

8 18.99 97.64 4.66 79.13 57.90 ERS177324

9 19.44 98.02 7.55 82.33 74.59 ERS177325

Nature Genetics: doi:10.1038/ng.2811

 

74  

10 16.65 98.05 6.04 81.54 69.70 ERS177326

11 17.38 98.11 6.15 81.56 69.96 ERS177327

12 18.52 98.21 6.72 81.64 71.43 ERS177328

13 13.59 98.10 4.92 80.77 63.14 ERS177329

14 17.02 98.08 6.20 81.62 70.31 ERS177330

Meishan Jiangsu province,

China

1 18.03 97.98 6.85 82.01 72.78 ERS177331

2 17.92 98.09 6.74 81.76 70.73 ERS177332

3 17.17 97.11 6.07 80.56 65.81 ERS177333

4 19.76 98.12 7.79 81.24 70.06 ERS177334

Pietrain Belgium

1 20.68 97.98 4.95 81.05 64.29 ERS177336

2 20.91 97.93 8.2 81.84 73.33 ERS177337

3 16.45 96.71 6.22 79.83 62.04 ERS177338

4 10.88 96.51 4.28 76.35 49.97 ERS177339

5 21.44 97.78 4.92 80.33 60.87 ERS177340

Xiang Guangxi province,

China 1 17.66 98.23 6.41 81.27 70.02 ERS177355

2 17.37 98.04 6.26 81.28 69.64 ERS177356

Wild boar

France France 1 18.54 97.94 7.32 81.28 70.39 ERS177349 Japan Japan 1 21.55 97.91 8.44 81.19 71.03 ERS177344

Meinweg, the Netherlands

Meinweg, the Netherlands

1 10.56 96.90 4.17 76.87 50.82 ERS177347

2 15.70 97.89 6.08 81.28 68.48 ERS177348

North China North China 1 9.31 96.15 3.64 72.50 41.06 ERS177353

2 19.29 97.55 7.55 81.24 70.16 ERS177354

South China South China 1 9.83 97.07 3.91 75.04 46.47 ERS177351

2 19.83 98.13 7.78 81.57 72.04 ERS177352

Sumatran Sumatra, Indonesia 1 21.56 98.02 8.33 80.82 70.55 ERS177308

2 20.98 98.22 8.30 80.70 69.69 ERS177310

Nature Genetics: doi:10.1038/ng.2811

 

75  

Switzerland Switzerland 1 28.39 97.53 6.29 81.73 70.51 ERS177350

Veluwe, the Netherlands

Veluwe, the Netherlands

1 18.18 97.88 7.15 81.59 71.46 ERS177345

2 22.56 97.63 7.33 81.97 72.58 ERS177346 African warthog

Phacochoerus africanus

Tanzania 1 23.13 97.91 8.45 78.09 66.44 ERS177335

Genus Sus

Sus barbatus Sumatra, Indonesia 1 12.73 97.53 4.93 77.56 55.92 ERS177309

Sus cebifrons Philippines 1 19.05 96.67 7.42 80.43 70.52 ERS177341 Sus

celebensis Sulawesi, Indonesia 1 46.06 97.88 17.88 82.37 77.39 ERS177342

Sus verrucosus

Java, Indonesia 1 24.04 97.74 9.5 80.92 71.84 ERS177343

* The criteria used for sequence read filtering are slightly different between our sequenced data (see ‘1.2 Sequence quality checking and filtering’)

and the downloaded genome data (phred quality ≤ 20)7-9.

Nature Genetics: doi:10.1038/ng.2811

 

76  

Supplementary Table 33. Summary of SNP calling on a population-scale.

Category Tibetan

wild boarDomestic pig

Wild boar, genus Sus and warthog

Total

Sample Size n = 30 n = 52 n = 21 n = 103 Number of total SNPs 8,390,501 9,173,377 7,780,578 14,637,670

Number of Shared SNPs 3,020,386

Supplementary Table 34. Tracy-Widom (TW) statistics for the first ten eigenvalues

from PCA analysis of pig breeds.

Number Eigenvalues TW P value

1 28.318 34.685 4.18 × 10-61

2 14.368 48.295 3.58 × 10-99

3 5.626 17.219 1.42 × 10-22

4 5.514 21.185 3.86 × 10-30

5 4.239 8.921 1.58 × 10-9

6 4.076 9.063 1.02 × 10-9

7 3.992 10.426 1.41× 10-11

8 3.858 11.107 1.48 × 10-12

9 3.475 6.935 1.62 × 10-7

10 3.182 3.305 9.37 × 10-4

Nature Genetics: doi:10.1038/ng.2811

 

77  

Supplementary Table 35. Summary of SNPs in Tibetan wild boars and Chinese

domestic pigs.

Category Tibetan wild

boar

Chinese

domestic pigTotal

Sample size n = 30 n = 15 n = 45

Number of total SNP 8,390,501 6,011,186 9,492,123

Number of shared SNP 4,909,564

Upstream 55,163 38,265 62,906

Exonic

Nonsynonymous 18,326 12,515 21,062

Synonymous 27,142 17,223 30,804

Nonsyn/Syn ratio (ω) 0.67 0.73 0.68

Stop gain 332 217 389

Stop loss 91 67 99

Unknown 3,879 2,883 4,584

Intronic 2,232,946 1,577,151 2,519,351

Splicing 160 108 182

Downstream 55,794 39,246 63,798

Upstream/Downstream 607 437 725

Intergenic 5,996,061 4,323,074 6,788,223

The package ANNOVAR119 was used to identify whether SNPs cause protein coding

changes and the amino acids that are affected. ‘Upstream’ refers to a variant that overlaps

with the 1 kb region upstream of the gene start site. ‘Stop gain’ means that a

nonsynonymous SNP leads to the creation of a stop codon at the variant site. ‘Stop loss’

means that a nonsynonymous SNP leads to the elimination of a stop codon at the variant

site. ‘Unknown’ means unknown function (due to various errors in the gene structure

definition in the database file). ‘Splicing’ means that a variant is within 2 bp of a splice

junction. ‘Downstream’ means that a variant overlaps with the 1 kb region downstream of

the gene end site. ‘Upstream/Downstream’ means that a variant is located in downstream

and upstream regions (possibly for two different genes).

Nature Genetics: doi:10.1038/ng.2811

 

78  

Supplementary Table 36. Functional gene categories enriched for genes affected by

natural and artificial selection.

Functional category

Term ID Term description P value Involved

gene number

Tibetan wild boar

GO-BP GO:0006281 DNA repair 9.11E-03 2

InterProScan IPR007237 CD20-like 1.08E-02 2

InterProScan IPR021072 Melanoma associated antigen, MAGE, N-terminal

1.25E-02 2

GO-MF GO:0015276 Ligand-gated ion channel activity 1.27E-02 4

GO-MF GO:0016779 Nucleotidyltransferase activity 1.39E-02 15

GO-MF GO:0034061 DNA polymerase activity 1.48E-02 14

InterProScan IPR000477 Reverse transcriptase 2.17E-02 13

InterProScan IPR005135 Endonuclease/exonuclease/phosphatase 2.47E-02 7

GO-MF GO:0005230 Extracellular ligand-gated ion channel activity 2.84E-02 3

GO-BP GO:0006278 RNA-dependent DNA replication 2.87E-02 13

GO-MF GO:0003964 RNA-directed DNA polymerase activity 2.87E-02 13

InterProScan IPR000980 SH2 domain 2.90E-02 4

GO-MF GO:0003723 RNA binding 2.98E-02 17

GO-BP GO:0006259 DNA metabolic process 3.94E-02 16

InterProScan IPR003036 Core shell protein Gag P30 4.05E-02 2

GO-MF GO:0003777 Microtubule motor activity 4.09E-02 3

GO-MF GO:0070279 Vitamin B6 binding 4.56E-02 3

GO-BP GO:0007017 Microtubule-based process 4.63E-02 4

GO-MF GO:0003774 Motor activity 4.90E-02 4

InterProScan IPR002190 MAGE protein 4.94E-02 2

Domestic pig

GO-MF GO:0004888 Transmembrane signaling receptor activity 4.21E-04 36

GO-MF GO:0005149 Interleukin-1 receptor binding 5.01E-04 2

InterProScan IPR003502 Interleukin-1 propeptide 5.50E-04 2

InterProScan IPR003294 Interleukin-1, alpha/beta 5.50E-04 2

InterProScan IPR000048 IQ calmodulin-binding region 8.28E-04 7

GO-BP GO:0050671 Positive regulation of lymphocyte proliferation 5.09E-03 5

GO-BP GO:0070665 Positive regulation of leukocyte proliferation 5.43E-03 5

GO-BP GO:0032946 Positive regulation of mononuclear cell proliferation

5.43E-03 5

InterProScan IPR000975 Interleukin-1 7.75E-03 2

GO-BP GO:0050878 Regulation of body fluid levels 9.01E-03 7

GO-BP GO:0009968 Negative regulation of signal transduction 9.70E-03 3

GO-BP GO:0043407 Negative regulation of MAP kinase activity 1.04E-02 4

GO-MF GO:0004984 Olfactory receptor activity 1.08E-02 22

GO-BP GO:0007166 Cell surface receptor signaling pathway 1.09E-02 38

Nature Genetics: doi:10.1038/ng.2811

 

79  

GO-MF GO:0016772 Transferase activity, transferring phosphorus-containing groups

1.22E-02 40

GO-BP GO:0007186 G-protein coupled receptor signaling pathway 1.26E-02 35

GO-MF GO:0016503 Pheromone receptor activity 1.28E-02 2

InterProScan IPR004072 Vomeronasal receptor, type 1 1.40E-02 2 KEGG

pathway map04914 Progesterone-mediated oocyte maturation 1.42E-02 4

GO-BP GO:0006720 Isoprenoid metabolic process 1.80E-02 4

GO-BP GO:0046541 Saliva secretion 1.94E-02 2

GO-BP GO:0006662 Glycerol ether metabolic process 2.00E-02 2

GO-BP GO:0006955 Immune response 2.04E-02 6 KEGG

pathway hsa04730 Long-term depression 2.09E-02 5

InterProScan IPR000725 Olfactory receptor 2.09E-02 22

GO-BP GO:0050670 Regulation of lymphocyte proliferation 2.09E-02 5

GO-BP GO:0070663 Regulation of leukocyte proliferation 2.18E-02 5

GO-BP GO:0032944 Regulation of mononuclear cell proliferation 2.18E-02 5

GO-BP GO:0008299 Isoprenoid biosynthetic process 2.60E-02 3

GO-BP GO:0000188 Inactivation of MAPK activity 2.60E-02 3

GO-BP GO:0042102 Positive regulation of T cell proliferation 2.69E-02 3

InterProScan IPR017452 GPCR, rhodopsin-like superfamily 3.31E-02 27

GO-BP GO:0006954 Inflammatory response 3.32E-02 2

GO-BP GO:0043405 Regulation of MAP kinase activity 3.33E-02 6

GO-BP GO:0051251 Positive regulation of lymphocyte activation 3.45E-02 5

GO-BP GO:0050777 Negative regulation of immune response 3.94E-02 3

InterProScan IPR006201 Neurotransmitter-gated ion-channel 4.50E-02 3

GO-BP GO:0002696 Positive regulation of leukocyte activation 4.54E-02 5

Nature Genetics: doi:10.1038/ng.2811

 

80  

Supplementary Note

1 De novo sequencing, assembly and annotation of Tibetan wild boar

genome

1.1 Sequencing strategy and data generation

We used a whole genome shotgun strategy and next-generation sequencing

technologies on the Illumina HiSeq 2000 platform to sequence the genome of

Tibetan wild boar. DNA were extracted from a female Tibetan wild boar from

Daocheng County (~ 3,750 m altitude) in the Tibetan plateau of China. All the

animals and samples used in this study were collected according to the

guidelines for the care and use of experimental animals established by the

Ministry of Agriculture of China. Short-insert (180 bp and 500 bp) and

long-insert (2 kb, 5 kb and 10 kb) DNA libraries were constructed according to

the manufacturer’s specifications (Illumina), and read lengths were 101 bp, 75

bp and 51 bp (Supplementary Table 1). In total, we generated ~319.3 Gb of

sequence.

1.2 Sequence quality checking and filtering

To avoid reads with artificial bias (i.e. low quality paired reads, which mainly

result from base-calling duplicates and adapter contamination), we removed

the following type of reads:

(a) Reads with ≥ 10% unidentified nucleotides (N);

(b) Reads with > 10 nt aligned to the adapter, allowing ≤ 10% mismatches;

(c) Reads with > 50% bases having phred quality < 5; and

(d) Putative PCR duplicates generated by PCR amplification in the library

construction process (i.e. read 1 and read 2 of two paired-end reads that were

completely identical).

Consequently, 278.2 Gb (114.5 x coverage) was retained for assembly, of

which the quality of 95% and 90% of the bases were ≥ Q20 and ≥Q30,

respectively (Supplementary Table 1).

1.3 Estimation of genome size using K-mer method

To estimate the genome size of the Tibetan wild boar, we selected 130.05 Gb

high-quality reads from the short-insert reads (180 bp), and generated 19-mer

Nature Genetics: doi:10.1038/ng.2811

 

81  

frequency information based on the K-mer analysis as implemented in the

software Meryl120,121. The estimate size of Tibetan wild boar genome is

2,379.31 Mb (~2.38 Gb) (Supplementary Fig. 4 and Supplementary Table

2).

1.4 De novo assembly

The paired-end reads of 180 bp, 500 bp and 2 kb DNA libraries were

processed using the error-correction module of ALLPATHS-LG122. We

assembled the Tibetan wild boar genome using SOAPdenovo, a de Bruijn

graph algorithm based de novo genome assembler123.

Firstly, the corrected reads of 180 bp and 500 bp DNA libraries were used to

construct the contig sequences employing 27-mers. Consequently, we

obtained a contig N50 size of 1,124 bp and a contig N90 size of 252 bp with

the fragments longer than 100 bp.

Secondly, we realigned all the reads, including those from the short-insert

libraries (180 bp and 500 bp) and the long-insert libraries (2 kb, 5 kb and 10

kb), onto the contig sequences with 83.60% of the aligned paired-end reads.

Thirdly, we constructed scaffolds using adjacent contigs identified by

paired-end information that had at least four consistent read pairs.

Consequently, the contig N50 and N90 sizes (based on fragments longer than

500 bp) within these scaffolds were improved to 10,830 bp and 2,411 bp,

respectively. The scaffold N50 and N90 sizes were also enhanced to

1,068,344 bp and 231,601 bp.

Fourthly, to close the gaps within the constructed scaffolds (caused mainly

by the presence of repeats that were masked during scaffold construction), we

used the paired-end information to retrieve the read pairs that had one read

well-aligned on the contigs and the other read located in the gap region, and

then performed a local assembly for these collected reads using the package

Gapcloser (version 1.12)123.

This last step improved the contig N50 and N90 sizes to 20,411 bp and

4,605 bp, and the scaffold N50 and N90 sizes to 1,049,950 and 227,167 bp,

respectively, with the fragments longer than 100 bp (Supplementary Table 3).

Consequently, a total length of ungapped sequence of 2.43 Gb was generated

Nature Genetics: doi:10.1038/ng.2811

 

82  

for the Tibetan wild boar genome, similar to the amount generated for the

Duroc pig genome (2.52 Gb) (Table 1 and Supplementary Table 11).

1.5 Detections of heterozygous SNPs and deletion or insertion

polymorphisms (InDels)

To evaluate the heterozygosity rate for the Tibetan wild boar genome, we

realigned the ~216.2 Gb high-quality reads from short-insert libraries (180bp

and 500 bp) onto the genome assembly using the package BWA124

(Supplementary Fig. 7 and Supplementary Table 4). Then we preformed

SNP calling using the package SOAPsnp125, and finally obtained ~4.4 M

heterozygous SNPs for the Tibetan wild boar genome with a high-confidence

(i.e. the coverage depth ≥ 4 and ≤ 150, the genotype quality ≥ 20, copy number

≤ 2 and the distance of adjacent SNPs ≥ 5) (Supplementary Fig. 8), which

represents a heterozygous SNP rate in the wild Tibetan wild boar of 1.82 ×

10-3.

In addition, we performed InDel calling for the Tibetan wild boar genome

using a Bayesian approach implemented in the package SAMtools. The

‘mpileup’ command was used to identify InDels with the parameters ‘-m 2 -F

0.002 -d 1,000’. A total of 984,284 InDels were identified, ranging from 1 bp to

30 bp in length of which 982 (0.10%) were in coding regions (Supplementary

Fig. 11 and Supplementary Table 7).

1.6 Repeat annotation

After the genome assembly, we performed repeat annotation for the Tibetan

wild boar genome.

(a) Identification of known transposable elements (TEs)

We used RepeatMasker Vision 3.3.0 (Supplementary URLs) against the

Repbase TE library (RM database vision 20110920)126, and

RepeatProteinMask (Supplementary URLs) performing WU-BLASTX against

the TE protein database.

(b) De novo repeat prediction

Nature Genetics: doi:10.1038/ng.2811

 

83  

We built a de novo repeat library for the Tibetan wild boar using

RepeatModeler Vision 1.0.5 (Supplementary URLs) which uses two core

programs, i.e. RECON127 and RepeatScout128 to generate the TE families.

(c) Identification of tandem repeats

We identified non-interspersed repeat sequences using RepeatMasker with

the “-nolow” option, including the simple repeat, satellites and low complexity

repeats. We also predicted tandem repeats using the package Tandem Repeat

Finder129, with parameters set to “Match=2, Mismatch=7, Delta=7, PM=80,

PI=10, Minscore=50, and MaxPeriod=12”.

In addition, to compare the TE characters among different genomes, we

performed repeat annotation for the Duroc pig, human and cattle genomes

based on the same pipeline used for the Tibetan wild boar (Supplementary

Fig. 10 and Supplementary Tables 5, 6).

1.7 Structural annotation of genes

The genes in the Tibetan wild boar genome were predicted using ab initio-,

and homology-based methods, and by incorporating evidence of transcription

from the RNA-seq data.

(a) Ab initio prediction

We used the ab initio predication packages Augustus130, Geneid131,

Genscan132, GlimmerHMM133 and SNAP134 with the parameters trained from a

set of high-quality homologous prediction proteins.

(b) Homology-based prediction

The protein repertoires of human, mouse, cattle, dog and the Duroc pig were

downloaded from Ensembl release 67 and mapped onto the repeat-masked

Tibetan wild boar genome using TBLASTn135. Then, homologous genome

sequences were aligned against the matching proteins using Genewise136 to

define gene models. Moreover, we aligned the porcine cDNA and EST

sequences onto the Tibetan wild boar genome, which provided the evidence

for the homology-based prediction.

(c) RNA-seq data

To optimize the genome annotation, four tissue RNA libraries (i.e. heart, liver,

lung and kidney) were constructed using the Illumina mRNA-Seq Prep Kit and

Nature Genetics: doi:10.1038/ng.2811

 

84  

about 27.9 Gb of sequence was generated (100 bp at each end). RNA-seq

reads were aligned to both the Tibetan wild boar and Duroc pig reference

assemblies using TopHat (v2.0.7) 137 with default parameters to identify exons

region and splice positions (Supplementary Table 12). The alignment results

were then used as input for Cufflinks (v2.0.2)138 with default parameters for

genome-based transcript assembly. The final non-redundant reference gene

set was generated by merging genes predicted by three methods using

EvidenceModeler (EVM)139, and genes with ≤ 50 amino acids, or only with de

novo predictive support were removed (Supplementary Table 13). The final

reference gene set of the Tibetan wild boar was comprised of 21,806 genes

which is comparable with the gene repertoire of the Duroc pig genome (21,640

genes) (Supplementary Table 15).

1.8 Functional annotation of genes

Gene functions were assigned according to the best match of the alignment to

the SwissProt and TEMBL databases140, using BLASTP135. We annotated

motifs and domains using InterPro141 by searching against publicly available

databases, including Pfam142, PRINTS, PROSITE, ProDom, and SMART

using InterProScan141. Gene Ontology (GO) terms143 for each gene were

retrieved from the corresponding InterPro descriptions (Supplementary Table

16). Furthermore, we also mapped these Tibetan wild boar genes to the KEGG

pathway144 to identify the best match category for each gene.

1.9 non-coding RNA (ncRNA) annotations

The tRNA genes were predicted by tRNAscan-SE145 with eukaryote

parameters. The rRNA, microRNA (miRNA) and small nuclear (snRNA) were

identified using the Infernal software146 by searching against the Rfam

database147 with default parameters (Supplementary Table 10). In addition,

we filtered the miRNAs, snRNAs and tRNAs which were located in the repeat

or gap regions, as well as the rRNAs of short length (≤ 50 bp) and low identity

(≤ 85%).

2 Lineage-specific genes

2.1 Gene family cluster and orthology relationships

Nature Genetics: doi:10.1038/ng.2811

 

85  

All DNA and protein data for the Duroc pig, human, mouse, cattle and dog

were downloaded from Ensembl database release 67. For genes with

alternative splicing variants, we chose the longest transcripts (≥ 30 amino

acids) to represent the genes. We used the Treefam methodology148 to define

a gene family as a group of genes that descended from a single gene in the

last common ancestor of the considered species. An all-against-all BLASTP135

was applied to determine the similarities between genes in three (Tibetan wild

boar, Duroc pig and human) or in six (Tibetan wild boar, Duroc pig, cattle, dog,

mouse and human) mammalian genomes with the e-value of 1e-7 and

conjoined fragmental alignments for each gene pair by Solar (Supplementary

Figs. 12, 14 and Supplementary URLs).

We assigned a connection (edge) between the two nodes (genes), if more

than 1/3 of the region aligned to both genes. A minimum edge weight that

ranged from 0 to 100 was used to weigh the similarity (edge). For clustering

protein coding genes into gene families, we used the average distance for the

hierarchical clustering algorithm by Hcluster_sg, requiring edge weight ≥ 10,

and the minimum edge density (total number of edges/theoretical number of

edges) ≥ 0.34.

2.2 Evidence of transcription for the Tibetan wild boar-specific genes

A total 27.9 Gb of RNA-seq sequences generated from the four libraries were

mapped to the Tibetan wild boar genome using TopHat137. Gene expression

levels were determined using the normalized RPKM values (reads per

kilobase per million mapped reads) (Supplementary Table 17).

3 Functional enrichment analyses for genes

Functional enrichment analysis of Gene Ontology (GO) terms and pathways

was performed using the DAVID (Database for Annotation, Visualization and

Integrated Discovery) web server149,150. Genes were submitted to DAVID for

enrichment analysis of the significant overrepresentation of GO biological

processes (GO-BP), molecular function (GO-MF) terminologies, and

categories of InterPro domain and KEGG-pathway. In all tests, the whole set of

known genes was appointed as the background, and P values (i.e. EASE

scores), indicating significance of the overlap between various gene sets, were

Nature Genetics: doi:10.1038/ng.2811

 

86  

calculated using a Benjamini-corrected modified Fisher’s exact test. Only

GO-BP, GO-MF, KEGG-pathway or InterPro domain terms with a P value less

than 0.05 were considered as significant and listed.

4 Identification of pseudogenes

We identified 188 pseudogenes in the Tibetan wild boar genome, containing

137 frameshift and 60 premature termination events based on the in silico

filters and further manual examination (Supplementary Table 28). We first

aligned all human protein sequences from Ensembl release 67 onto the

Tibetan wild boar genome using TBLASTn135. Then the best matched regions

of each gene were reduced and re-aligned using GeneWise136, to help define

the exon-intron structure. To avoid splicing errors near the frameshift or

premature termination events, we also aligned human genes onto the human

genome with the same pipeline. Cases with high mapping quality (numbers of

reads covering ≥ 10 and with matched transcription reads), excluding any

splicing error, SNPs or InDels, but containing the frameshift or premature

termination events were considered as pseudogenes. In addition, we aligned

the re-sequencing data sets of 30 Tibetan wild boars to the Tibetan wild boar

genome assembly and further evaluated the candidate pseudogenes.

5 Population-based re-sequencing and SNP calling

5.1 Re-sequencing strategy and read mapping

We sampled a total 48 pigs, including 30 Tibetan wild boars, 15 domestic pigs

in China and three wild boars in Southwest China (Fig. 2a and

Supplementary Table 31). Sequencing was performed on the Illumina HiSeq

2000 platform, and generated a total of 659.4 Gb of paired-end DNA sequence.

The criteria for quality checking and filtering of sequence (see ‘1.2 Sequence

quality checking and filtering’) were also applied.

Consequently, 655.9 Gb (99.5%, out of 659.4 Gb) high quality paired-end

reads were mapped to the Tibetan wild boar genome assembly using the BWA

software124. First, the reference was indexed. Second, the command ‘aln -o 1

-e 10 -t 4 -l 32 -i 15 -q 10’ was used to find the suffix array coordinates of good

matches for each read. Third, the best alignments were generated in the SAM

Nature Genetics: doi:10.1038/ng.2811

 

87  

format given paired-end reads with command ‘sampe’.

Next, we improved the alignment results with the following three steps:

(a) Filter the alignment read with mismatches ≤ 5 and mapping quality = 0;

(b) The alignment results were corrected using the package Picard

(Supplementary URLs) with two core commands. The

‘AddOrReplaceReadGroups’ command was used to replace all read groups in

the INPUT file with a new read group and assigns all reads to this read group

in the OUTPUT BAM. ‘FixMateInformation’ command was used to ensure that

all mate-pair information was in sync between each read and its mate pair;

(c) Remove potential PCR duplication. If multiple read pairs have identical

external coordinates, only retain the pair with the highest mapping quality.

Finally, for each individual, ~91.99% of reads mapped to 94.63% (at least 1 ×)

or 64.55% (at least 4 ×) of the reference genome assembly of the Tibetan wild

boar with 4.95-fold average depth (Supplementary Table 31).

In addition, we downloaded the genome data of 55 individuals (a total of

1,037 Gb genome data) from across the world from the EMBL-EBI database

(accession number ERP001813), including 30 European domestic pigs, 7

domestic pigs in Southeast China, 7 Asian wild boars, 6 European wild boars,

4 other species in the genus Sus, and an African warthog, with 6.72-fold

average depth, 97.77% mapping rate and ~80.69% (at least 1 ×) or ~67.16%

(at least 4 ×) coverage of the Tibetan wild boar genome (Fig. 2a and

Supplementary Table 32). The lower mapping rate of Tibetan wild boar

re-sequences (see ‘1.2 Sequence quality checking and filtering’) than

sequences of other pigs to Tibetan wild boar genome is likely due to more

stringent filtering criteria used in other pig genome studies (e.g. phred quality

≤ 20) 7-9. When reads with phred quality ≤ 20 were filtered, the mapping rates

of Tibetan wild boars to the Tibetan wild boar genome increased to 98.90%,

which is higher than the mapping rate of any downloaded pig genome data set

to the Tibetan wild boar genome.

5.2 SNP calling

After alignment, we performed SNP calling on a population-scale for three

groups (30 Tibetan wild boars, 52 domestic pigs, and 21 wild boars and wild

Nature Genetics: doi:10.1038/ng.2811

 

88  

genus sus) using a Bayesian approach as implemented in the package

SAMtools151. The genotype likelihoods from reads for each individual at each

genomic location were calculated, and the allele frequencies were also

estimated. The ‘mpileup’ command was used to identify SNPs with the

parameters as ‘-q 1 -C 50 -S -D -m 2 -F 0.002 –u’.

Then, only the high quality SNPs (coverage depth ≥ 4 and ≤ 1,000, RMS

mapping quality ≥ 20, the distance of adjacent SNPs ≥ 5 bp and the missing

ratio of samples within each group < 50%) were kept for the subsequent

analysis. In total, we identified 14,637,670 (14.64 M) SNPs from 103

individuals (Supplementary Table 33). We then pooled separately and

obtained SNP sets for each of three groups, including 8,390,501 (8.39 M) from

the 30 Tibetan wild boars, 9,173,377 (9.17 M) from the 52 domestic pigs, and

7,780,578 (7.78 M) from the 21 wild boars as well as individuals of the wild

genus Sus (Supplementary Tables 33 and 35). The small proportion of (3.02

M of 14.64 M, 20.63%) SNPs were shared among the three groups, which

indicated the larger differences of genomic backgrounds among them.

6 Demographic history reconstruction

Demographic history of seven wild boars (three in Europe and four in Asia),

and six Tibetan wild boars from six geographically diverse populations was

inferred using a hidden Markov model (HMM) approach as implemented in

pairwise sequentially Markovian coalescence (PSMC) based on SNP

distribution152 (Fig. 2e). To improve the accuracy of inferred historical

recombination events, we only used the scaffolds larger than 50 kb (~93.85%

of all scaffolds) and ~7.6 M heterozygous SNPs for each individual were used

to reconstruct a demographic history. The program `fq2psmcfa' was used to

transform the consensus sequence into a fasta-like format where the i-th

character in the output sequence indicates whether there is at least one

heterozygote in the bin [100i, 100i+100). Parameters were set as follows:

‘−N30 −t15 −r5 −p ‘4+25*2+4+6’. The porcine generation time (g) = 5 years,

and neutral mutation rate per generation (μ) = 2.5 x 10-8 were based on

previous reports 7,9.

In addition, climate change and migration are two important factors

Nature Genetics: doi:10.1038/ng.2811

 

89  

influencing population size. Thus, we obtained atmospheric surface air

temperature (℃) and global relative sea level (10 m) data of the past 1 million

years from National Climatic Data Center (NCDC) (Supplementary URLs)

and combined them together with the demographic data into a single plot. Note

that PSMC simulation cannot detect population changes more recent than

10,000 years ago.

7 Linkage-disequilibrium (LD) analysis

To estimate the LD patterns between Tibetan wild boars and Chinese domestic

pigs, we used 6.01 M SNPs of 15 Chinese domestic pigs and merged them

with SNPs of the Tibetan wild boars resulting in 9.49 M SNPs in total. To

evaluate LD decay, the coefficient of determination (r2) between any two loci

was calculated using Haploview153 (Fig. 3a). Parameters were set as follows:

‘-n -dprime -minGeno 0 -missingCutoff 1 -minMAF 0.01’. Average r2 was

calculated for pairwise markers in a 500 kb window and averaged across the

whole genome.

Supplementary URLs

Breakdancer, http://gmt.genome.wustl.edu/breakdancer/1.2/index.html; Bioinf

ormatics and Systems Biology of Gent, http://bioinformatics.psb.ugent.be/w

ebtools/Venn/; InParanoid, http://inparanoid.sbc.su.se/cgi-bin/index.cgi; Multi

Paranoid, http://multiparanoid.sbc.su.se/; MEGA 5.15, http://www.megasoft

ware.net/; LASTZ, http://www.bx.psu.edu/miller_lab/; RepeatMasker, Repea

tProteinMask and RepeatModeler, http://www.RepeatMasker.org; Solar, htt

p://treesoft.svn.sourceforge.net/viewrc/treesoft/, Picard, http://sourceforge.

net/projects/picard/; National Climatic Data Center (NCDC), http://www.ncd

c.noaa.gov/.

Nature Genetics: doi:10.1038/ng.2811

 

90  

Supplementary References

1 Feuk, L. et al. Discovery of human inversion polymorphisms by comparative

analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 1,

e56, (2005).

2 Lai, J. et al. Genome-wide patterns of genetic variation among elite maize inbred

lines. Nat. Genet. 42, 1027-1030 (2010).

3 Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers

for identifying agronomically important genes. Nat. Biotechnol. 30, 105-111,

(2012).

4 Nguyen, D. T. et al. The complete swine olfactory subgenome: expansion of the

olfactory gene repertoire in the pig genome. BMC Genomics 13, 584 (2012).

5 Quignon, P. et al. The dog and rat olfactory receptor repertoires. Genome Biol. 6,

R83 (2005).

6 Castillo-Davis, et al. The functional genomic distribution of protein divergence in

two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 14,

802-811 (2004). 

7 Groenen, M. A. et al. Analyses of pig genomes provide insight into porcine

demography and evolution. Nature 491, 393-398 (2012).

8 Rubin, C. J. et al. Strong signatures of selection in the domestic pig genome. Proc.

Natl. Acad. Sci. USA 109, 19529-19536 (2012).

9 Bosse, M. et al. Regions of homozygosity in the porcine genome: consequence of

demography and the recombination landscape. PLoS Genet. 8, e1003100 (2012).

10 Romanenko, V., Nakamoto, T., Srivastava, A., Melvin, J. E. & Begenisich, T.

Molecular identification and physiological roles of parotid acinar cell maxi-K

channels. J. Biol. Chem. 281, 27964-27972 (2006).

11 Liu, X. et al. Attenuation of store-operated Ca2+ current impairs salivary gland fluid

secretion in TRPC1(-/-) mice. Proc. Natl. Acad. Sci. USA 104, 17542-17547

(2007).

12 Beall, C. M. et al. Natural selection on EPAS1 (HIF2α) associated with low

hemoglobin concentration in Tibetan highlanders. Proc. Natl. Acad. Sci. USA 107,

11459-11464 (2010).

13 Bigham, A. et al. Identifying signatures of natural selection in Tibetan and Andean

populations using dense genome scan data. PLoS Genet. 6 (2010).

14 Simonson, T. S. et al. Genetic evidence for high-altitude adaptation in Tibet.

Science 329, 72-75 (2010).

15 Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude.

Science 329, 75-78 (2010).

16 Peng, Y. et al. Genetic variations in Tibetan populations and high-altitude

adaptation at the Himalayas. Mol. Biol. Evol. 28, 1075-1081 (2011).

17 Xu, S. et al. A genome-wide search for signals of high-altitude adaptation in

Tibetans. Mol. Biol. Evol. 28, 1003-1011 (2011).

Nature Genetics: doi:10.1038/ng.2811

 

91  

18 Ji, L. D. et al. Genetic adaptation of the hypoxia-inducible factor pathway to

oxygen pressure among eurasian human populations. Mol. Biol. Evol. 29,

3359-3370 (2012).

19 Scheinfeldt, L. B. et al. Genetic adaptation to high altitude in the Ethiopian

highlands. Genome Biol. 13, R1 (2012).

20 Rankinen, T. et al. The human obesity gene map: the 2005 update. Obesity 14,

529-644 (2006).

21 MacDougald, O. A. & Burant, C. F. The rapidly expanding family of adipokines.

Cell. Metab. 6, 159-161 (2007).

22 Heid, I. M. et al. Meta-analysis identifies 13 new loci associated with waist-hip

ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat.

Genet. 42, 949-960 (2010).

23 Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new

loci associated with body mass index. Nat. Genet. 42, 937-948 (2010).

24 Li, M. et al. An atlas of DNA methylomes in porcine adipose and muscle tissues.

Nat. Commun.3, 850 (2012).

25 Yang, Z. PAML: a program package for phylogenetic analysis by maximum

likelihood. Comput. Appl. Biosci. 13, 555-556 (1997).

26 Lace, B. et al. BCL3 gene role in facial morphology. Birth. Defects Res. A Clin. Mol.

Teratol. 94, 918-924 (2012).

27 Wang, Y. & Lu, L. Activation of oxidative stress-regulated Bcl-3 suppresses CTCF

in corneal epithelial cells. PloS One 6, e23984 (2011).

28 Yu, H. et al. Association between single nucleotide polymorphisms in ERCC4 and

risk of squamous cell carcinoma of the head and neck. PloS One 7, e41853

(2012).

29 Krupa, R. et al. Polymorphisms of the DNA repair genes XRCC1 and ERCC4 are

not associated with smoking- and drinking-dependent larynx cancer in a Polish

population. Exp. Oncol. 33, 55-56 (2011).

30 Muftuoglu, M. et al. Cockayne syndrome group B protein stimulates repair of

formamidopyrimidines by NEIL1 DNA glycosylase. J. Biol. Chem. 284, 9270-9279

(2009).

31 Kim, H., Yang, K., Dejsuphong, D. & D'Andrea, A. D. Regulation of Rev1 by the

Fanconi anemia core complex. Nat. Struct. Mol. Biol. 19, 164-170 (2012).

32 Kuang, L. et al. A non-catalytic function of Rev1 in translesion DNA synthesis and

mutagenesis is mediated by its stable interaction with Rad5. DNA repair 12, 27-37

(2013).

33 Pajukanta, P. et al. Familial combined hyperlipidemia is associated with upstream

transcription factor 1 (USF1). Nat. Genet. 36, 371-376 (2004).

34 Corre, S. et al. In vivo and ex vivo UV-induced analysis of pigmentation gene

expressions. J. Invest. Dermatol. 126, 916-918 (2006).

35 Majerus, M. E. & Mundy, N. I. Mammalian melanism: natural selection in black and

Nature Genetics: doi:10.1038/ng.2811

 

92  

white. Trends Genet. 19, 585-588 (2003).

36 Fang, M., Larson, G., Ribeiro, H. S., Li, N. & Andersson, L. Contrasting mode of

evolution at a coat color locus in wild and domestic pigs. PLoS Genet. 5,

e1000341 (2009).

37 Yuan, J., Ghosal, G. & Chen, J. The HARP-like domain-containing protein

AH2/ZRANB3 binds to PCNA and participates in cellular response to replication

stress. Mol. Cell 47, 410-421 (2012).

38 Ciccia, A. et al. Polyubiquitinated PCNA recruits the ZRANB3 translocase to

maintain genomic integrity after replication stress. Mol. Cell 47, 396-409 (2012).

39 Andersson, O., Korach-Andre, M., Reissmann, E., Ibanez, C. F. & Bertolino, P.

Growth/differentiation factor 3 signals through ALK7 and regulates accumulation

of adipose tissue and diet-induced obesity. Proc. Natl. Acad. Sci. USA 105,

7252-7256 (2008).

40 Malik, S. G. et al. Association of β3-adrenergic receptor (ADRB3) Trp64Arg gene

polymorphism with obesity and metabolic syndrome in the Balinese: a pilot study.

BMC Res. Notes 4, 167 (2011).

41 Zawodniak-Szalapska, M. et al. Association of Trp64Arg polymorphism of

β3-adrenergic receptor with insulin resistance in Polish children with obesity. J.

Pediatr. Endocrinol. Metab. 21, 147-154 (2008).

42 Subauste, A. R. et al. Alterations in lipid signaling underlie lipodystrophy

secondary to AGPAT2 mutations. Diabetes 61, 2922-2931 (2012).

43 Agarwal, A. K. et al. AGPAT2 is mutated in congenital generalized lipodystrophy

linked to chromosome 9q34. Nat. Genet. 31, 21-23 (2002).

44 Shen, J. J. et al. Deficiency of growth differentiation factor 3 protects against

diet-induced obesity by selectively acting on white adipose. Mol. Endocrinol. 23,

113-123 (2009).

45 Laviano, A., Molfino, A., Rianda, S. & Rossi Fanelli, F. The growth hormone

secretagogue receptor (ghs-R). Curr. Pharm. Des. 18, 4749-4754 (2012).

46 Gauna, C. et al. Unacylated ghrelin is not a functional antagonist but a full agonist

of the type 1a growth hormone secretagogue receptor (GHS-R). Mol. Cell

Endocrinol. 274, 30-34 (2007).

47 Gottardo, L. et al. A polymorphism at the IL6ST (gp130) locus is associated with

traits of the metabolic syndrome. Obesity 16, 205-210 (2012).

48 Lin, F. H., Chu, N. F., Lee, C. H., Hung, Y. J. & Wu, D. M. Combined effect of

C-reactive protein gene SNP +2147 A/G and interleukin-6 receptor gene SNP

rs2229238 C/T on anthropometric characteristics among school children in Taiwan.

Int. J. Obes. 35, 587-594 (2011).

49 Camara-Clayette, V. et al. Transcriptional regulation of the KEL gene and Kell

protein expression in erythroid and non-erythroid cells. Biochem. J. 356, 171-180

(2001).

50 Ingallinella, P. et al. PEGylation of neuromedin U yields a promising candidate for

Nature Genetics: doi:10.1038/ng.2811

 

93  

the treatment of obesity and diabetes. Bioorgan. Med. Chem. 20, 4751-4759

(2012).

51 Malendowicz, L. K., Ziolkowska, A. & Rucinski, M. Neuromedins U and S

involvement in the regulation of the hypothalamo-pituitary-adrenal axis. Front.

Endocrinol. 3, 156 (2012).

52 Lu, B. et al. Expression of the phospholipid scramblase (PLSCR) gene family

during the acute phase response. Biochim. Biophys. Acta. 1771, 1177-1185

(2007).

53 Charos, A. E. et al. A highly integrated and complex PPARGC1A transcription

factor binding network in HepG2 cells. Genome Res. 22, 1668-1679 (2012).

54 Gemma, C. et al. Maternal pregestational BMI is associated with methylation of

the PPARGC1A promoter in newborns. Obesity 17, 1032-1039 (2009).

55 Connelly, M. A. & Williams, D. L. Scavenger receptor BI: a scavenger receptor

with a mission to transport high density lipoprotein lipids. Curr. Opin. Lipidol. 15,

287-295 (2004).

56 Jeyakumar, S. M., Vajreswari, A. & Giridharan, N. V. Impact of vitamin A on

high-density lipoprotein-cholesterol and scavenger receptor class BI in the obese

rat. Obesity 15, 322-329 (2007).

57 Le, M. T. et al. Impact of Genetic Polymorphisms of SLC2A2, SLC2A5, and KHK

on Metabolic Phenotypes in Hypertensive Individuals. PloS One 8, e52062 (2013).

58 Suviolahti, E. et al. The SLC6A14 gene shows evidence of association with

obesity. J. Clin. Invest. 112, 1762 (2003).

59 Walley, A. J., Asher, J. E. & Froguel, P. The genetic contribution to non-syndromic

human obesity. Nat. Rev. Genet. 10, 431-442 (2009).

60 Epstein, L. H. et al. Dopamine transporter genotype as a risk factor for obesity in

African-American smokers. Obesity Res. 10, 1232-1240 (2002).

61 van Dyck, C. H. et al. Increased dopamine transporter availability associated with

the 9-repeat allele of the SLC6A3 gene. J. Nucl. Med. 46, 745-751 (2005).

62 Benjafield, A. V., Glenn, C. L., Wang, X. L., Colagiuri, S. & Morris, B. J.

TNFRSF1B in genetic predisposition to clinical neuropathy and effect on HDL

cholesterol and glycosylated hemoglobin in type 2 diabetes. Diabetes Care 24,

753-757 (2001).

63 Tabassum, R. et al. Association analysis of TNFRSF1B polymorphisms with type

2 diabetes and its related traits in North India. Genomic Medicine 2, 93-100

(2008).

64 Motter, A. L. & Ahern, G. P. TRPV1-null mice are protected from diet-induced

obesity. FEBS Lett. 582, 2257-2262 (2008).

65 Garami, A. et al. Thermoregulatory phenotype of the Trpv1 knockout mouse:

thermoeffector dysbalance with hyperkinesis. J. Neurosci. 31, 1721-1733 (2011).

66 Suri, A. & Szallasi, A. The emerging role of TRPV1 in diabetes and obesity. Trends

Pharmacol. Sci. 29, 29-36 (2008).

Nature Genetics: doi:10.1038/ng.2811

 

94  

67 Qi, L. et al. TRB3 links the E3 ubiquitin ligase COP1 to lipid metabolism. Science

312, 1763-1766 (2006).

68 Sorrentino, V. & Zelcer, N. Post-transcriptional regulation of lipoprotein receptors

by the E3-ubiquitin ligase inducible degrader of the low-density lipoprotein

receptor. Curr. Opin. Lipidol. 23, 213-219 (2012).

69 Tortorella, M. D., Malfait, F., Barve, R. A., Shieh, H. S. & Malfait, A. M. A review of

the ADAMTS family, pharmaceutical targets of the future. Curr. Pharm. Des. 15,

2359-2374 (2009).

70 Wagstaff, L., Kelwick, R., Decock, J. & Edwards, D. R. The roles of ADAMTS

metalloproteinases in tumorigenesis and metastasis. Front. Biosci. 16, 1861-1872

(2011).

71 Reder, N. P. et al. Adrenergic α-1 pathway is associated with hypertension among

Nigerians in a pathway-focused analysis. PloS One 7, e37145 (2012).

72 Ro, H. S. et al. Adipocyte enhancer-binding protein 1 modulates adiposity and

energy homeostasis. Obesity 15, 288-302 (2007).

73 Elosua, R. et al. Obesity modulates the association among APOE genotype,

insulin, and glucose in men. Obesity Res. 11, 1502-1508 (2012).

74 Wang, J. et al. ApoE and the role of very low density lipoproteins in adipose tissue

inflammation: ApoE and adipose tissue inflammation. Atherosclerosis (2012).

75 Badano, J. L. et al. Identification of a novel Bardet-Biedl syndrome protein, BBS7,

that shares structural features with BBS1 and BBS2. Am. J. Hum. Genet. 72,

650-658 (2003).

76 Nachury, M. V. et al. A core complex of BBS proteins cooperates with the GTPase

Rab8 to promote ciliary membrane biogenesis. Cell 129, 1201-1213 (2007).

77 Katsanis, N. et al. Triallelic inheritance in Bardet-Biedl syndrome, a Mendelian

recessive disorder. Science 293, 2256-2259 (2001).

78 Thirone, A. C., Carvalheira, J. B., Hirata, A. E., Velloso, L. A. & Saad, M. J.

Regulation of Cbl-associated protein/Cbl pathway in muscle and adipose tissues

of two animal models of insulin resistance. Endocrinology 145, 281-293 (2004).

79 Taniguchi, C. M., Emanuelli, B. & Kahn, C. R. Critical nodes in signalling pathways:

insights into insulin action. Nat. Rev. Mol. Cell Bio. 7, 85-96 (2006).

80 Yu, Y. et al. Neuronal Cbl controls biosynthesis of insulin-like peptides in

Drosophila melanogaster. Mol. Cell Biol. 32, 3610-3623 (2012).

81 Huang, Y. S., Kan, M. C., Lin, C. L. & Richter, J. D. CPEB3 and CPEB4 in neurons:

analysis of RNA-binding specificity and translational control of AMPA receptor

GluR2 mRNA. EMBO J. 25, 4865-4876 (2006).

82 Harris, C. A. et al. DGAT enzymes are required for triacylglycerol synthesis and

lipid droplets in adipocytes. J. Lipid. Res. 52, 657-667 (2011).

83 Chen, H. C. Enhancing energy and glucose metabolism by disrupting triglyceride

synthesis: Lessons from mice lacking DGAT1. Nutr. Metab. 3, 10 (2006).

84 Lee, D. et al. Epiregulin is not essential for development of intestinal tumors but is

Nature Genetics: doi:10.1038/ng.2811

 

95  

required for protection from intestinal damage. Mol. Cell. Biol. 24, 8907-8916

(2004).

85 Bohme, M. et al. Association between functional FABP2 promoter haplotypes and

body mass index: analyses of 8,072 participants of the KORA cohort study. Mol.

Nutr. Food. Res. 53, 681-685 (2009).

86 Martinez-Lopez, E. et al. Effect of Ala54Thr polymorphism of FABP2 on

anthropometric and biochemical variables in response to a moderate-fat diet.

Nutrition 29, 46-51 (2013).

87 Camats, N. et al. Contribution of human growth hormone-releasing hormone

receptor (GHRHR) gene sequence variation to isolated severe growth hormone

deficiency (ISGHD) and normal adult height. Clin. Endocrinol. 77, 564-574 (2012).

88 Lee, L. T. et al. Discovery of growth hormone-releasing hormones and receptors in

nonmammalian vertebrates. Proc. Natl. Acad. Sci. USA 104, 2133-2138 (2007).

89 Mracek, T., Drahota, Z. & Houstek, J. The function and the role of the

mitochondrial glycerol-3-phosphate dehydrogenase in mammalian tissues.

Biochim. Biophys. Acta. 1827, 401-410 (2012).

90 Muoio, D. M. & Newgard, C. B. Obesity-related derangements in metabolic

regulation. Annu. Rev. Biochem. 75, 367-401 (2006).

91 Koh, H. J. et al. Cytosolic NADP+ dependent isocitrate dehydrogenase plays a key

role in lipid metabolism. J. Biol. Chem. 279, 39968-39974 (2004).

92 Sutter, N. B. et al. A single IGF1 allele is a major determinant of small size in dogs.

Science 316, 112-115 (2007).

93 Boucher, J. et al. Impaired thermogenesis and adipose tissue development in

mice with fat-specific disruption of insulin and IGF-1 signalling. Nat. Commun.3,

902 (2012).

94 Xu, J. et al. The voltage-gated potassium channel Kv1.3 regulates peripheral

insulin sensitivity. Proc. Natl. Acad. Sci. USA 101, 3112-3117 (2004).

95 Tucker, K., Overton, J. M. & Fadool, D. A. Kv1.3 gene-targeted deletion alters

longevity and reduces adiposity by increasing locomotion and metabolism in

melanocortin 4 receptor-null mice. Int. J. Obes. 32, 1222-1232 (2008).

96 Sadagurski, M. et al. IRS2 signaling in LepR-b neurons suppresses FoxO1 to

control energy balance independently of leptin action. Cell. Metab. 15 (2012).

97 Myers, M. G., Jr. & Olson, D. P. Central nervous system control of metabolism.

Nature 491, 357-363 (2012).

98 Macia, L. et al. Neuropeptide Y1 receptor in immune cells regulates inflammation

and insulin resistance associated with diet-induced obesity. Diabetes 61,

3228-3238 (2012).

99 Rojas, J. M. et al. Central nervous system neuropeptide Y signaling via the Y1

receptor partially dissociates feeding behavior from lipoprotein metabolism in lean

rats. Am. J. Physiol. Endocrinol. Metab. 303, E1479-1488 (2012).

100 Mul, J. D. et al. Pmch expression during early development is critical for normal

Nature Genetics: doi:10.1038/ng.2811

 

96  

energy homeostasis. Am. J. Physiol. Endocrinol. Metab. 298, 477-488 (2010).

101 Kokkotou, E. et al. Melanin-concentrating hormone as a mediator of intestinal

inflammation. Proc. Natl. Acad. Sci. USA 105, 10613-10618 (2008).

102 Wang, S. et al. Activation of AMP-activated protein kinase α2 by nicotine instigates

formation of abdominal aortic aneurysms in mice in vivo. Nat. Med. 18, 902-910

(2012).

103 Lee-Young, R. S. et al. Obesity impairs skeletal muscle AMPK signaling during

exercise: role of AMPK α2 in the regulation of exercise capacity in vivo. Int. J.

Obes. 35, 982-989 (2011).

104 Tiganis, T. PTP1B and TCPTP - nonredundant phosphatases in insulin signaling

and glucose homeostasis. FEBS J. (2012).

105 Tonks, N. K. Protein tyrosine phosphatases: from genes, to function, to disease.

Nat. Rev. Mol. Cell Bio.7, 833-846 (2006).

106 Huang, Y. J. et al. Recombinant human butyrylcholinesterase from milk of

transgenic animals to protect against organophosphate poisoning. Proc. Natl.

Acad. Sci. USA 104, 13603-13608 (2007).

107 Ilyushin, D. G. et al. Chemical polysialylation of human recombinant

butyrylcholinesterase delivers a long-acting bioscavenger for nerve agents in vivo.

Proc. Natl. Acad. Sci. USA 110, 1243-1248 (2013).

108 Geyer, B. C. et al. Plant-derived human butyrylcholinesterase, but not an

organophosphorous-compound hydrolyzing variant thereof, protects rodents

against nerve agents. Proc. Natl. Acad. Sci. USA 107, 20251-20256 (2010).

109 De Boer, A., Van der Sandt, I. & Gaillard, P. The role of drug transporters at the

blood-brain barrier. Annu. Rev. Pharmacol. 43, 629-656 (2003).

110 Das, M. & Das, D. K. Caveolae, caveolin, and cavins: potential targets for the

treatment of cardiac disease. Ann. Med. 44, 530-541 (2012).

111 Narendra, S., Valente, A., Tull, J. & Zhang, S. DDIT3 gene break-apart as a

molecular marker for diagnosis of myxoid liposarcoma assay validation and

clinical experience. Diagn. Mol. Pathol. 20, 218-224 (2011).

112 Nemoto, K. et al. Characteristics of nobiletin-mediated alteration of gene

expression in cultured cell lines. Biochem. Biophys. Res. Commun.,

doi:10.1016/j.bbrc.2013.01.024 (2013).

113 Wilkie, M. J. et al. Polymorphisms in the SLC6A4 and HTR2A genes influence

treatment outcome following antidepressant therapy. Pharmacogenomics J. 9,

61-70 (2009).

114 Wrzosek, M. et al. Serotonin 2A receptor gene (HTR2A) polymorphism in

alcohol-dependent patients. Pharmacol. Rep. 64, 449-453 (2012).

115 Kim, E. J. et al. Alzheimer's disease risk factor lymphocyte-specific protein

tyrosine kinase regulates long-term synaptic strengthening, spatial learning and

memory. Cell Mol. Life Sci., doi:10.1007/s00018-012-1168-1 (2013).

116 Venkitachalam, S., Chueh, F. Y., Leong, K. F., Pabich, S. & Yu, C. L. Suppressor of

Nature Genetics: doi:10.1038/ng.2811

 

97  

cytokine signaling 1 interacts with oncogenic lymphocyte-specific protein tyrosine

kinase. Oncol. Rep. 25, 677-683 (2011).

117 Simonaro, C. M. et al. Imprinting at the SMPD1 locus: implications for acid

sphingomyelinase-deficient Niemann-Pick disease. Am. J. Hum. Genet. 78,

865-870 (2006).

118 Kirkegaard, T. et al. Hsp70 stabilizes lysosomes and reverts Niemann-Pick

disease-associated lysosomal pathology. Nature 463, 549-553 (2010).

119 Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic

variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164

(2010).

120 Myers, E. W. et al. A whole-genome assembly of Drosophila. Science 287,

2196-2204 (2000).

121 Li, R. et al. The sequence and de novo assembly of the giant panda genome.

Nature 463, 311-317 (2010).

122 Butler, J. et al. ALLPATHS: De novo assembly of whole-genome shotgun

microreads. Genome Res. 18, 810-820 (2008).

123 Li, R. et al. De novo assembly of human genomes with massively parallel short

read sequencing. Genome Res. 20, 265-272 (2010).

124 Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754-1760 (2009).

125 Li, R. et al. SNP detection for massively parallel whole-genome resequencing.

Genome Res. 19, 1124-1132 (2009).

126 Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements.

Cytogenet. Genome Res. 110, 462-467 (2005).

127 Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence

families in sequenced genomes. Genome Res. 12, 1269-1276 (2002).

128 Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families

in large genomes. Bioinformatics 21 Suppl 1, 351-358 (2005).

129 Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic

Acids Res. 27, 573-580 (1999).

130 Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new

intron submodel. Bioinformatics 19 Suppl 2, 215-225 (2003).

131 Parra, G., Blanco, E. & Guigo, R. GeneID in Drosophila. Genome Res. 10,

511-515 (2000).

132 Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic DNA.

Genome Res. 10, 516-522 (2000).

133 Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two

open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878-2879

(2004).

134 Korf, I. Gene finding in novel genomes. BMC bioinformatics 5, 59 (2004).

135 Kent, W. J. BLAT--the BLAST-like alignment tool. Genome Res. 12, 656-664

Nature Genetics: doi:10.1038/ng.2811

 

98  

(2002).

136 Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14,

988-995 (2004).

137 Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with

RNA-Seq. Bioinformatics 25, 1105-1111 (2009).

138 Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals

unannotated transcripts and isoform switching during cell differentiation. Nat.

Biotechnol. 28, 511-515 (2010).

139 Haas, B. J. et al. Automated eukaryotic gene structure annotation using

EVidenceModeler and the program to assemble spliced alignments. Genome Biol.

9, R7 (2008).

140 Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its

supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45-48 (2000).

141 Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence

classification and comparison. Methods Mol. Biol. 396, 59-70 (2007).

142 Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40,

D290-301 (2012).

143 Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene

Ontology Consortium. Nat. Genet. 25, 25-29 (2000).

144 Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes.

Nucleic Acids Res. 28, 27-30 (2000).

145 Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of

transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955-964 (1997).

146 Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA

alignments. Bioinformatics 25, 1335-1337 (2009).

147 Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes.

Nucleic Acids Res. 33, D121-124 (2005).

148 Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene

families. Nucleic Acids Res. 34, D572-580 (2006).

149 Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative

analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4,

44-57 (2009).

150 Huang da, W. et al. DAVID Bioinformatics Resources: expanded annotation

database and novel algorithms to better extract biology from large gene lists.

Nucleic Acids Res. 35, W169-175 (2007).

151 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,

2078-2079 (2009).

152 Li, H. & Durbin, R. Inference of human population history from individual

whole-genome sequences. Nature 475, 493-496 (2011).

153 Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization

of LD and haplotype maps. Bioinformatics 21, 263-265 (2005).

Nature Genetics: doi:10.1038/ng.2811