analysis of the bread wheat genome using whole -genome shotgun sequencing manuel spannagl

Download Analysis  of the bread wheat genome using whole -genome  shotgun sequencing Manuel  Spannagl

Post on 31-Dec-2015




6 download

Embed Size (px)


Analysis of the bread wheat genome using whole -genome shotgun sequencing Manuel Spannagl MIPS, Helmholtz Center Munich. Wheat - why bother ?. Many varieties incl. bread wheat , durum ( pasta ) wheat - PowerPoint PPT Presentation


Slide 1

Analysis of the bread wheat genome using whole-genome shotgun sequencing

Manuel SpannaglMIPS, Helmholtz Center MunichThanks for invitingChallenge the sheer size and complexityUse comparative genomics and integrative genomicsReduction of complexity1Wheat - why bother?

Many varieties incl. bread wheat, durum (pasta) wheat

Third most-produced cereal with 651 millions tons (2010), cultivated worldwide in different climates

Leading source of vegetable protein in human foodEconomic accesability of large (but average sized) genomes; However analytical barriers....2The Challenge

Economic accesability of large (but average sized) genomes; However analytical barriers....3

Wheat a WGS approachAims and Goals 5x 454 WGS sequencing => 85 Gb sequence, 220 million reads ~79% of reads repeat-related

direct Low-copy-number genome assembly (LCG, Newbler) => collapses many homologous gene sequences to prevent collapsing of homologous gene sequences and reduce complexity => orthologous group assembly at high stringencyWheat a WGS approachUse fully sequenced and analysed reference genomes (rice, Brachypodium, sorghum)

Group genes into families (Orthologous Groups)

Use the orthologous group representatives as sequence baits to capture corresponding sequence reads.

Do sub-assembly for each orthologous bin seperatelyWGS assembly using in silico exon capture

Bread Wheat Genaology

Ortholome directed assembly circumvents limitations faced by WGS assemblyProfit from related genomes and their genome resources and flcDNA resourcesgenerate OrtholomeIn silico exome captureHigh stringency assembly of each orthologous group8The ortholome directed assembly delivers ordered segments

The ortholome directed assembly delivers ordered segments II

132Coverage of Orthologous Group

Gene Copy Retention after Polyploidization- Calibration of the method-

97%99%100%MaizeHexaploid RiceTRiceGene Copy Retention after Polyploidization

Involves lots of testing and evaluation on assembly parameters to use: 97, 99, 100% overlap accuracy depending on criteria gene copy number detection variabilities as consequence of similarity among genomes; in the end 2,1 copies per gene retained


Gene Copy Retention after PolyploidizationGene fragments are abundant in wheat

Gene fragments are abundant in the wheat genome

Expanded Wheat Gene Families

Shotguns (Illumina 80x (T.monococcum)) and 454 (3x (Ae.tauschii))

cDNA seqs from the Ae. speltoides group (B)

Can A and D genome shotgun data be used to dissect the ABD of wheat?

The Three Nephews: the A, B and Ds of wheatNext question of course is can we subsect gene complement into A, B and D, put them genes a T shirt on that tells us were they belong to18

The Three Nephews: Similarity on a Sequence BasisFairly complicated due to close similarity among subgenomes;


Wheat A, B and D Assignment using Machine Learning (SVM)Particular Gene Categories are preferentially retained

Franz MarcHocken im SchneeAlmost full gene complement detected and structured

10000s of pseudogenes detected

Separation of A, B and D using machine learning with > 75% accuracy

Complementary to chromosome sorting approaches

Applicable to polyploids in general to get genome overview

Rapid and economic approach to pragmatically cope with limitations in sequence technology


In Silico Exon Capture Statistics

The composition of A, B and D are similaracknowledgementsMIPSMatthias PfeiferKlaus MayerAll other group members

The UK Wheat ConsortiumMike BevanNeil HallAnthony HallKeith EdwardsRachel Brenchley

CSHLDick McCombie

UC Davis & USDA AlbanyJan DvorakMincheng LuoOlin Anderson

Kansas State UniversityBikram GillSunish Segal

EBIPaul KerseyDan BolserRaw whole-genome sequencing data (454 reads)(source:,098,052

total sequence [bp]82,801,349,875

Repeat content [bp]62,290,705,717 (75%)

# of 454 reads remaining after repeat masking and filtering65,851,441

total sequence [bp]23,500,630,080 (28%)

# of mapped reads4,058,985 (6%)

# of unique mapped reads2,740,044 (68%)

# of multiple mapped reads1,318,941(32%)

# of OG representatives matched by mapped reads with quality information (incl. TE-related OG representatives)19,482 (97%)

mi 97%mi 99%mi 100%

Newbler sub-assembly statistics

# of singletons887,615 (22%)1,222,242 (31%)1,696,740 (42%)

# of assembled reads3,038,943 (76%)2,689,502 (67%)2,057,928 (51%)

# reads excluded from the assembly75,440 (2%)90,254 (2%)247,330 (6%)

# of repeat reads1 8991,025480

# of outlier reads274,50289,190246,811

# too short reads3393939

# of assembled contigs205,817172,039120,501

# of sub-assemblies used for copy number analysis (contigs + singletons)1,093,4321,394,2811,817,241

total sequence [bp]497,965,174630,756,335793,978,129

minimum / maximum length [bp]52 / 7,41552 / 7,31252 / 4,386

mean length [bp]455.41452.39436.91

N50 / N90 [bp]482 / 323479 / 326471 / 322

Re-alignment of sub-assemblies to OG representatives

# of re-aligned sub-assemblies1,019,315 (93%)1,338,548 (96%)1,775,454 (98%)

# of OG representatives with accepted, re-aligned sub-assemblies (incl. TE-related OG representatives)19,42919,46719,475

# of OG representatives which are associated to TE and removed manually149149150

1 the read was either inferred to be repetitive early in the assembly process (>70% of the read's seed hit to at least 70 other reads)2 the read was identified as problematic (e.g. chimeric sequences or assembler artifacts)3 the read was too short to be used (


View more >