sequencing of complex plant genomes: big data …big … · •direct sequencing of hmw dna...

20
Sequencing of complex plant genomes: big data …big deal? Raymond Hulzink, Ph.D Applications and Challenges of Oxford Nanopore Sequencing in the Life Science Industry Wageningen , April 14, 2016 [email protected]

Upload: vancong

Post on 06-May-2018

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

Sequencing of complex plant genomes: big data …big deal?

Raymond Hulzink, Ph.DApplications and Challenges of Oxford Nanopore Sequencing in the Life Science Industry

Wageningen, April 14, 2016

[email protected]

Page 2: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 2

Genome assemblyThe challenge

Long-read sequencing technologies have accelerated whole genome (re-)sequencing approaches and reduced costs dramatically ..

de novo construction of highly accurate draft genome sequences in complex organisms is still a challenge and costly ..

high-quality ultra-long reads are needed

‘Repeats longer than read length cannot be resolved!’

but,

therefore,

Page 3: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 3

Plant genomesSize

Large bitter cress54 Mb

Source: Michael Apel - Own work. Licensed under CC BY 3.0 via Wikimedia Commons -http://commons.wikimedia.org/ wiki/ File: Fritillaria_meleagris_MichaD.jpg#/ media/ File:Fritillaria_meleagris_MichaD.jpg

Snake's head124,852 Mb

Source: Walter Siegmund - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:Cardamine_amara_eF.jpg

Japanese canopy plant149,000 Mb

Source: Alpsdake, via Wikimedia Commons- Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -http://commons.wikimedia.org/wiki/File:Paris_japonica_Kinugasasou_in_Hakusan_2010_7_18.jpg

Data source: http://data.kew.org/cvalues/CvalServlet?querytype=1

Cropse.g.

Tomato800 Mb

Pepper3,200 Mb

Barley5,000 Mb

Lettuce1,200 Mb

Melon400 Mb

Page 4: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 4

Complexity

Plant genomes

4

• Repetitive DNAº Medium

- Tandem repeats (rRNA, tRNA)- Gene families (paralogs)- Transposable elements (e.g. retro)

º High- Tandem arranged SSRs- Centromeric tandem repeats

• Heterozygosity, polyploidy

e.g. pepper genome ~81% repetitive sequences

Qin et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum demestication and specialization. PNAS 111: 5135-5140

Page 5: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 5

MAP @ KeyGene

• Phase 1 (2014): set-up system,

testing software, and sequencing ONT reference DNA (λ genome)

• Phase 2 (2015): sequencing

experimental DNA (plant BAC clones)

“ I have just been looking at some QC metrics that

the software has sent back to us and see that your

flow cell is running hotter than I would have

expected …. ” Oxford Nanopore

Page 6: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 6

BAC sequencingRead alignment against reference

• Despite a low number of 2D pass reads (<10%), BAC references were completely covered (8-20x depth)

Dep

th o

f Co

vera

ge

(# o

f re

ads)

Map Position on ReferenceMinION / FLO-MAP003

• Alignment with MarginAlignagainst PB references

• Sequencing error rate showed ~83% of read accuracy

Page 7: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 7

BAC sequencingDe novo assembly

• High quality assemblies with a small number of substitution errors and a moderate amount of insertion / deletion errors

• Successful de novo assembly for two BAC clones with 10 - 15 fold read depth

BAC H049 – Assemb 2 BAC H032 – Assemb 2

Pac

Bio

refe

ren

ce (

bas

es)

Nanopore assembly (bases)

• de novo assembly with Celera assembler after one or two rounds of error correction (NanoCorrect)

• Alignment against PB reference using MUMmer with dnadiff tool for estimation of per-base accuracy

Page 8: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 8

Genome sequencingPlant pathogen Rhizoctonia solani

• Soil-borne plant pathogenic fungus

• Causes a wide range of commercially significant plant diseases

• Estimated genome size ~50-55 Mb

o heterokaryotic (≥ 2 distinct nuclear genomes)o 10% repetitive sequences

o duplicated genomic regions

• Draft genome sequences available from different subgroups• High level of sequence differences between different

subgroups (~21% shared core genes)

• Generate draft genome assembly of Rhizoctonia solani

o MinION MK1 sequencer with MAP006 chemistry and

FLO-MAP103 flow cells

Page 9: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 9

Genome sequencingExtraction of ultra-pure (u)HMW DNA

• DNA quality and integrity are essential for obtaining high-quality long reads

• Extraction of ultra-pure HMW and uHMW from plants has unique challenges that require specific expertise to deal with carbohydrates, phenolics, and other compounds abundant in

plant tissues

• KeyGene has developed protocols for extraction, purification, analysis, and quantitation of

DNA from a variety of (difficult) plant and pathogen species.

Page 10: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 10

R. solani sequencingExtraction and sizing of fungal HMW DNA

Nanodrop Qubit-BR

Tape Station

[ng/uL] 260/280 260/230 [ng/uL] [ng/uL]

210 2.22 2.18 190 162

- - - 265 257

1,372 1.87 0.92 53 -

Crude

Purified

Sized

~45%

Page 11: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 11

R. solani sequencingLibrary preparation

~12.5 K(9K hydropore S )

~18.8 K(10K hydropore L)

>60 K

MAP006 work flow

Lib002

Lib003

Lib004

100

80

62

23

100

66 64

17

100

80

46

20

10

20

30

40

50

60

70

80

90

100

RECO

VERY

(%)

Lib002

Lib003

Lib004

Page 12: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 12

R. solani sequencingLibrary and read size distribution

19 K

23 K

34 K

21.3 K

Lib003

17.9 K

Lib002

56.6 K

Lib004

Library size Read length (MinKnow) 2D Read Length (Metrichor) 2D Pass Read Length

8.5 K

11.3 K

15.3 K

Sequence length 2D

Sequence length 2D

Sequence length 2D

Page 13: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 13

R. solani sequencing2D pass read summary

Library RunRemarks

# 2D Pass Reads

Total length (Mb)

Max Read Length (Kb)

Median 2D Quality Score

53.5 ng (6 uL) -air bubble

2,900 26 15.8 9.4

53.5 ng (6 uL) -heat sink ~40°C

4,204 36 29.0 8.8

89.2 ng (10 uL) 25,346 223 25.9 10.0

37.8 ng (6 uL) 13,068 152 34.2 8.9

125.2 ng (20 uL) 23,806 269 43.7 8.6

7.8 ng (6 uL) 3,414 53 61.4 9.5

28.6 ng (22 uL) 5,931 89 80.4 9.4

17.9 K

21.3 K

56.6 K

Read length (bases)

% R

ead

s

cumulative length distribution

Page 14: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 14

Genome assembly Miniasm and Canu assembly summary

• ~54 Mb draft genome sequence with Canu consisting of 679 contigs with a N50 value of ~170 K and a maximum contig length of more than 2 Mb

• longer reads produce more contiguous assemblies

Page 15: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 15

Genome assembly Comparison between genome assemblies

Reference Platform Sequence Yield (Mb)

Sum contigs(Mb)

# scaffolds N50 length (Kb) # contigs N50 length (Kb)

Zheng et al. Nat Commun 2013

GAII 5,604 36.9 2,648 ~475 6,452 ~20.3

Cubeta et al. Genome Ann 2014

Sanger/ FLX

- 51.7 326 ~7,444 6,040 ~25.9

Hane et al. PLOS Gen 2014

HiSeq - 39.8 857 ~161 7,606 ~7.2

Wibberg et al. J Biotech 2015

FLX/Mi-Seq

2,200/ 2,000

42.8 879 - 3,793 ~35.1

Wibberg et al. BMC Gen 2016

MiSeq 2,800 52 2,065 ~81.2 5,826 ~15.2

KeyGene -Canu

MK1 848,6 54.1 - - 679 ~170

• With only 5 flow cells, about 15X coverage

• T.b.d.: detailed read coverage analysis to determine the level of genome duplication and the estimated heterokaryotic genome size

Page 16: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 16

Genome alignmentComparison between two public assemblies

• Alignment of public assemblies (MUMmer)

• Comparative genome analysis reveals considerable genetic differences between different isolates (i.e. genome size, gene number and composition)

• Level of similarity between R. solanidraft genomes but with an overall low level of co-linearity

Cubeta et al 2014- assembly (bases)

Zhen

g e

t al

20

13-a

ssem

bly

(b

ases

)

Page 17: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 17

Genome alignmentKeyGene assembly vs. Cubeta et al. 2014

• Considerable sequence diversity exists between the KeyGene strain and public Rhizoctonia strains

Cub

eta

et a

l 20

14-a

ssem

bly

(b

ases

)

KeyGene canu assembly (bases)

Page 18: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

• Plant BAC DNA sequencing

• De novo assembly for two BAC clones with 10 - 15 fold read depth

• High quality assemblies using a low number of 2D pass reads

• Rhizoctonia solani genome sequencing

• Large number of high-quality 2D pass reads in 24 hour runs

• Direct sequencing of HMW DNA positively effects read length

• Generated a ~54 Mb draft genome sequence with an estimated read depth of 10x

• Low level of co-linearity between nanopore assembly and published draft genomes

The crop innovation company 18

Conclusions

• Sequencing of complex plant genomes: big data … big deal!

Page 19: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

• Rhizoctonia solani genome sequencing• Improving the synthesis of long fragment libraries (yield, size) • Sequencing additional flow cells • Testing more tools and parameters

• Plant genome sequencing• KeyGene joined PromethION Early Access Programme (PEAP)• Draft genome sequence of a melon variety using the PromethION

The crop innovation company 19

What’s next ….?

• Meet us at …

Page 20: Sequencing of complex plant genomes: big data …big … · •Direct sequencing of HMW DNA positively effects read length ... •Sequencing of complex plant genomes: big data

The crop innovation company 20

Acknowledgements

Erwin DatemaAlex BoshovenKoen CuelenaereLisanne BlommersAlexander WittenbergNathalie van OrsouwMichiel van Eijk

The KeyBase®, KeyPoint® Mutation Breeding, WGP™, Sequenced Based Genotyping and KeyGene® SNPSelect technologies are protected by patents and/or patent applications owned by Keygene N.V. KeyGene, KeyBase, KeyPoint and KeySeeQ are registered trademarks of Keygene N.V. in one or more territories in the world. All other products names, brand names or company names are used for identification purposes only, and may be (registered) trademarks of their respective owners.