the genome assemblies of tasmanian devil zemin ning the wellcome trust sanger institute

16
The Genome Assemblies of The Genome Assemblies of Tasmanian Devil Zemin Ning Zemin Ning The Wellcome Trust Sanger Institute The Wellcome Trust Sanger Institute

Upload: lilian-carroll

Post on 29-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

The Genome Assemblies The Genome Assemblies of of

Tasmanian Devil

Zemin NingZemin Ning

The Wellcome Trust Sanger InstituteThe Wellcome Trust Sanger Institute

Page 2: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

The Phusion2 pipeline Flow-sorting sequencing Assigning contigs to individual chromosomes The Devil genome assembly Future work

Outline of the Talk:

Page 3: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Phusion2 Assembly PipelinePhusion2 Assembly Pipeline

SolexaReads

Assembly

Reads Group

Data Process Long Insert Reads

Supercontig

Contigs

PRono

Fuzzypath

Velvet

Phrap

2x75 or 2x100

BaseCorrection

GraphRP

Page 4: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Mis-assembly Errors: Mis-assembly Errors: Contig Break based Contig Break based

on Inconsistent Pairson Inconsistent Pairs

Page 5: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Mis-assembly Errors: Mis-assembly Errors: Contig Break Based on Pair CoverageContig Break Based on Pair Coverage

Page 6: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Pipeline of Contig Gap ClosurePipeline of Contig Gap Closure

Page 7: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Sequencing T. Devil on Illumina: Strategy

Tumour or normal genomic DNA

Fragments of defined size0.5, 2, 5, 7, 8, 10 kb

Sequencing

2x100bp reads short insert

2x50bp mate pairs

Alignment using bwa, smalt

Somatic mutations

Germline variants

fragment size distribution

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

0 1000 2000 3000 4000 5000 6000

size

fre

qu

en

cy

tumour 2kb

tumour 3kb

tumour 4kb

normal 2kb

normal 3kb

normal 4kb

Sequencing performed at Illumina

De novo Assembly

Page 8: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Table 1 Run ID, Template names, Number of reads and Chromosome size4972_1 chr1 IL20_4972:1 19.8 5714967_1 chr2 IL21_4967:1 20.0 6104971_1 chr3 IL30_4971:1 21.7 5564964_1 chr4 IL14_4964:1 7.26 4504969_1 chr5 IL17_4969:1 7.06 3414969_2 chr6 IL17_4969:2 8.59 2774969_3 chrx IL17_4969:3 9.43 122

Read mapping coefficient:Read mapping coefficient:

e = Size_of_Chr/Num_reads_in_lanee = Size_of_Chr/Num_reads_in_lane

Page 9: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Chr1 EAS25_101_1 70 549Chr1 EAS25_101_2 70 549

Chr2 EAS25_101_3 70 580Chr2 EAS25_101_4 70 580

Chr3 EAS25_101_5 70 547Chr3 EAS25_101_6 70 547

Chr4 EAS188_173_3 70 448Chr4 EAS188_173_4 70 448

Chr5 EAS188_173_5 70 346Chr5 EAS188_173_6 70 346

Chr6 EAS188_173_7 70 285Chr6 EAS188_173_8 70 285

Chrx EAS188_173_2 70 122

New Data Sequenced by IlluminaNew Data Sequenced by Illumina

Page 10: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned N_reads

Chr1 571 65262 604 19.8Chr2 610 76959 673 20.0Chr3 556 68842 585 21.7Chr4 450 48352 446 7.26Chr5 341 30451 279 7.06Chr6 277 25726 250 8.59Chrx 122 14189 83.6 9.53Unassigned 60841 54.7 (1.8%)

Page 11: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Table 2 Chr_ID, Chr_size, Contigs_assigned Bases_assigned Mb

Chr1 571 6729 684 Chr2 610 8381 740 Chr3 556 7197 641Chr4 450 4817 487Chr5 341 3188 300 Chr6 277 2844 263Chrx 122 2378 86.6Unassigned 440 1.23

New Data Sequenced by IlluminaNew Data Sequenced by Illumina

Page 12: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Unassigned contigs were placed by Unassigned contigs were placed by supercontigs using mate pairssupercontigs using mate pairs

Page 13: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute
Page 14: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute
Page 15: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Solexa reads:Number of read pairs: 650 Million;Finished genome size: 3.3 GB;Read length: 2x100bp;Estimated read coverage: ~40X;Insert size: 410/50-600 bp;Mate pair data: 2k,4k,5k,6k,8k,10kNumber of reads clustered: 591 Million

Assembly features: - statsContigs Supercontigs

Total number of contigs: 237,291 35,974Total bases of contigs: 2.93 Gb 3.17 GbN50 contig size: 20,139 1,847,186Largest contig: 189,866 5,315,556 Averaged contig size: 12,354 88,254Contig coverage on genome: ~94% >99%Ratio of placed PE reads: ~92% ?

Genome Genome Assembly Normal – T. DevilAssembly Normal – T. Devil

Page 16: The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute

Acknowledgements:

Elizabeth Murchuson David McBride Fengtang Yang Mike Stratton

Ole Schulz-Trieglaff Dirk Evers David Bentley