sequencing of complex plant genomes: big data …big … · •direct sequencing of hmw dna...
TRANSCRIPT
Sequencing of complex plant genomes: big data …big deal?
Raymond Hulzink, Ph.DApplications and Challenges of Oxford Nanopore Sequencing in the Life Science Industry
Wageningen, April 14, 2016
The crop innovation company 2
Genome assemblyThe challenge
Long-read sequencing technologies have accelerated whole genome (re-)sequencing approaches and reduced costs dramatically ..
de novo construction of highly accurate draft genome sequences in complex organisms is still a challenge and costly ..
high-quality ultra-long reads are needed
‘Repeats longer than read length cannot be resolved!’
but,
therefore,
The crop innovation company 3
Plant genomesSize
Large bitter cress54 Mb
Source: Michael Apel - Own work. Licensed under CC BY 3.0 via Wikimedia Commons -http://commons.wikimedia.org/ wiki/ File: Fritillaria_meleagris_MichaD.jpg#/ media/ File:Fritillaria_meleagris_MichaD.jpg
Snake's head124,852 Mb
Source: Walter Siegmund - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:Cardamine_amara_eF.jpg
Japanese canopy plant149,000 Mb
Source: Alpsdake, via Wikimedia Commons- Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:Paris_japonica_Kinugasasou_in_Hakusan_2010_7_18.jpg
Data source: http://data.kew.org/cvalues/CvalServlet?querytype=1
Cropse.g.
Tomato800 Mb
Pepper3,200 Mb
Barley5,000 Mb
Lettuce1,200 Mb
Melon400 Mb
The crop innovation company 4
Complexity
Plant genomes
4
• Repetitive DNAº Medium
- Tandem repeats (rRNA, tRNA)- Gene families (paralogs)- Transposable elements (e.g. retro)
º High- Tandem arranged SSRs- Centromeric tandem repeats
• Heterozygosity, polyploidy
e.g. pepper genome ~81% repetitive sequences
Qin et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum demestication and specialization. PNAS 111: 5135-5140
The crop innovation company 5
MAP @ KeyGene
• Phase 1 (2014): set-up system,
testing software, and sequencing ONT reference DNA (λ genome)
• Phase 2 (2015): sequencing
experimental DNA (plant BAC clones)
“ I have just been looking at some QC metrics that
the software has sent back to us and see that your
flow cell is running hotter than I would have
expected …. ” Oxford Nanopore
The crop innovation company 6
BAC sequencingRead alignment against reference
• Despite a low number of 2D pass reads (<10%), BAC references were completely covered (8-20x depth)
Dep
th o
f Co
vera
ge
(# o
f re
ads)
Map Position on ReferenceMinION / FLO-MAP003
• Alignment with MarginAlignagainst PB references
• Sequencing error rate showed ~83% of read accuracy
The crop innovation company 7
BAC sequencingDe novo assembly
• High quality assemblies with a small number of substitution errors and a moderate amount of insertion / deletion errors
• Successful de novo assembly for two BAC clones with 10 - 15 fold read depth
BAC H049 – Assemb 2 BAC H032 – Assemb 2
Pac
Bio
refe
ren
ce (
bas
es)
Nanopore assembly (bases)
• de novo assembly with Celera assembler after one or two rounds of error correction (NanoCorrect)
• Alignment against PB reference using MUMmer with dnadiff tool for estimation of per-base accuracy
The crop innovation company 8
Genome sequencingPlant pathogen Rhizoctonia solani
• Soil-borne plant pathogenic fungus
• Causes a wide range of commercially significant plant diseases
• Estimated genome size ~50-55 Mb
o heterokaryotic (≥ 2 distinct nuclear genomes)o 10% repetitive sequences
o duplicated genomic regions
• Draft genome sequences available from different subgroups• High level of sequence differences between different
subgroups (~21% shared core genes)
• Generate draft genome assembly of Rhizoctonia solani
o MinION MK1 sequencer with MAP006 chemistry and
FLO-MAP103 flow cells
The crop innovation company 9
Genome sequencingExtraction of ultra-pure (u)HMW DNA
• DNA quality and integrity are essential for obtaining high-quality long reads
• Extraction of ultra-pure HMW and uHMW from plants has unique challenges that require specific expertise to deal with carbohydrates, phenolics, and other compounds abundant in
plant tissues
• KeyGene has developed protocols for extraction, purification, analysis, and quantitation of
DNA from a variety of (difficult) plant and pathogen species.
The crop innovation company 10
R. solani sequencingExtraction and sizing of fungal HMW DNA
Nanodrop Qubit-BR
Tape Station
[ng/uL] 260/280 260/230 [ng/uL] [ng/uL]
210 2.22 2.18 190 162
- - - 265 257
1,372 1.87 0.92 53 -
Crude
Purified
Sized
~45%
The crop innovation company 11
R. solani sequencingLibrary preparation
~12.5 K(9K hydropore S )
~18.8 K(10K hydropore L)
>60 K
MAP006 work flow
Lib002
Lib003
Lib004
100
80
62
23
100
66 64
17
100
80
46
20
10
20
30
40
50
60
70
80
90
100
RECO
VERY
(%)
Lib002
Lib003
Lib004
The crop innovation company 12
R. solani sequencingLibrary and read size distribution
19 K
23 K
34 K
21.3 K
Lib003
17.9 K
Lib002
56.6 K
Lib004
Library size Read length (MinKnow) 2D Read Length (Metrichor) 2D Pass Read Length
8.5 K
11.3 K
15.3 K
Sequence length 2D
Sequence length 2D
Sequence length 2D
The crop innovation company 13
R. solani sequencing2D pass read summary
Library RunRemarks
# 2D Pass Reads
Total length (Mb)
Max Read Length (Kb)
Median 2D Quality Score
53.5 ng (6 uL) -air bubble
2,900 26 15.8 9.4
53.5 ng (6 uL) -heat sink ~40°C
4,204 36 29.0 8.8
89.2 ng (10 uL) 25,346 223 25.9 10.0
37.8 ng (6 uL) 13,068 152 34.2 8.9
125.2 ng (20 uL) 23,806 269 43.7 8.6
7.8 ng (6 uL) 3,414 53 61.4 9.5
28.6 ng (22 uL) 5,931 89 80.4 9.4
17.9 K
21.3 K
56.6 K
Read length (bases)
% R
ead
s
cumulative length distribution
The crop innovation company 14
Genome assembly Miniasm and Canu assembly summary
• ~54 Mb draft genome sequence with Canu consisting of 679 contigs with a N50 value of ~170 K and a maximum contig length of more than 2 Mb
• longer reads produce more contiguous assemblies
The crop innovation company 15
Genome assembly Comparison between genome assemblies
Reference Platform Sequence Yield (Mb)
Sum contigs(Mb)
# scaffolds N50 length (Kb) # contigs N50 length (Kb)
Zheng et al. Nat Commun 2013
GAII 5,604 36.9 2,648 ~475 6,452 ~20.3
Cubeta et al. Genome Ann 2014
Sanger/ FLX
- 51.7 326 ~7,444 6,040 ~25.9
Hane et al. PLOS Gen 2014
HiSeq - 39.8 857 ~161 7,606 ~7.2
Wibberg et al. J Biotech 2015
FLX/Mi-Seq
2,200/ 2,000
42.8 879 - 3,793 ~35.1
Wibberg et al. BMC Gen 2016
MiSeq 2,800 52 2,065 ~81.2 5,826 ~15.2
KeyGene -Canu
MK1 848,6 54.1 - - 679 ~170
• With only 5 flow cells, about 15X coverage
• T.b.d.: detailed read coverage analysis to determine the level of genome duplication and the estimated heterokaryotic genome size
The crop innovation company 16
Genome alignmentComparison between two public assemblies
• Alignment of public assemblies (MUMmer)
• Comparative genome analysis reveals considerable genetic differences between different isolates (i.e. genome size, gene number and composition)
• Level of similarity between R. solanidraft genomes but with an overall low level of co-linearity
Cubeta et al 2014- assembly (bases)
Zhen
g e
t al
20
13-a
ssem
bly
(b
ases
)
The crop innovation company 17
Genome alignmentKeyGene assembly vs. Cubeta et al. 2014
• Considerable sequence diversity exists between the KeyGene strain and public Rhizoctonia strains
Cub
eta
et a
l 20
14-a
ssem
bly
(b
ases
)
KeyGene canu assembly (bases)
• Plant BAC DNA sequencing
• De novo assembly for two BAC clones with 10 - 15 fold read depth
• High quality assemblies using a low number of 2D pass reads
• Rhizoctonia solani genome sequencing
• Large number of high-quality 2D pass reads in 24 hour runs
• Direct sequencing of HMW DNA positively effects read length
• Generated a ~54 Mb draft genome sequence with an estimated read depth of 10x
• Low level of co-linearity between nanopore assembly and published draft genomes
The crop innovation company 18
Conclusions
• Sequencing of complex plant genomes: big data … big deal!
• Rhizoctonia solani genome sequencing• Improving the synthesis of long fragment libraries (yield, size) • Sequencing additional flow cells • Testing more tools and parameters
• Plant genome sequencing• KeyGene joined PromethION Early Access Programme (PEAP)• Draft genome sequence of a melon variety using the PromethION
The crop innovation company 19
What’s next ….?
• Meet us at …
The crop innovation company 20
Acknowledgements
Erwin DatemaAlex BoshovenKoen CuelenaereLisanne BlommersAlexander WittenbergNathalie van OrsouwMichiel van Eijk
The KeyBase®, KeyPoint® Mutation Breeding, WGP™, Sequenced Based Genotyping and KeyGene® SNPSelect technologies are protected by patents and/or patent applications owned by Keygene N.V. KeyGene, KeyBase, KeyPoint and KeySeeQ are registered trademarks of Keygene N.V. in one or more territories in the world. All other products names, brand names or company names are used for identification purposes only, and may be (registered) trademarks of their respective owners.