the tomato genome re-seq project - university of florida - flinkers.pdf · ignores differences...

The tomato genome re-seq project

http://www.tomatogenome.net

5 February 2013, Richard Finkers & Sjaak van Heusden

Rationale

Genetic diversity in commercial tomato germplasm relatively narrow

Unexploited genetic diversity available in land races and old varieties?

Cultivated tomato has lost valuable traits during domestication

Wild species - source of genetic diversity

● Diverse habitat ● Variation in flowers and fruits ● Variation in mating systems

Most wild species can be crossed with cultivated tomato (introgression breeding)

Rationale

Tomato Genome (Re-) Sequencing Project • Identify alleles underpinning phenotypic diversity

across the entire genome and entire tomato clade

Acknowledgement: Sjaak van Heuden, Paris market

Tomato fruit shape variation

Rodríguez et al (2011) Plant physiology 156: 275-85

EU-SOL core collection

https://www.eu-sol.wur.nl Information:

Marker data Phenotype data Passport data

Markers 20 (7000 -> 1000) 384 (1000 -> 200) 7500 ( 200 -> 34)

Selected landraces for (re-)sequencing

200 landraces

1000 landraces

> 7000 landraces

Acknowledgement: Dani Zamir et al. & Keygene N.V.

https://www.eu-sol.wur.nl/

Landraces & old cultivar collection

Fruit phenotypes EU-SOL collection

Improving with exotic genetic libraries

Wild tomato species are valuable candidate for novel alleles

Dani Zamir, Nature Reviews Genetics 2, 983-989 (December 2001)

Improving with exotic genetic libraries

Moyle 2008

Phylogenetic relationships in the Solanum clade

51

(re-)sequencing collection

Lycopersicon group

Arcanum group

Eriopersicon group

Neolycopersicon group

2 6 4

3 2 2 1 3 2 7 2

Tree according to Anderson et al. (2010), redrawn from Moyle 2008

Genome Alignment

Read mapping to cv. Heinz Genome structure

wild tomato relatives?

Lycopersicon group

Arcanum group

Eriopersicon group

Neolycopersicon group

Reference genomes: De novo assembly selection

Heinz1706

LA 2157

LYC 4

LA 716

Presenter

Presentation Notes

Rationale: (Nearly) homozygous accessions Inbred over a few generations Representative for re-seq read mapping

Data production

84 Resequenced genomes ● 500 bp, 2x100 bp Paired-end Illumina

● Average coverage 41x

3 de novo genomes (S. arcanum, S. habrochaites, S. pennellii) ● 170 bp, 2x 100 bp Paired end Illumina

● 2 kb, 2 x 100 bp Mate-paired end Illumina

● 8 kb matepair (454)

● 20 kb matepair (454)

● Average coverage 205x

Genomic sequencing libraries

K-mer graph

0

100

200

300

400

500

600

700

800

900

1000

0 10 20 30 40 50 60 70 80 90 100

31

-mer

vol

um

e M

illio

ns

31-mer frequency

31-mer histogram

'001'

FIT

'045'

FIT

'046'

FIT

'053'

FIT

'054'

FIT

'058'

FIT

'072'

FIT

'074'

FIT

Data: 500 bp, 2x100 bp Paired-end Illumina

Acknowledgement: Theo Borm

K-mer exploration

Fitted modi ● Homozygous ● Heterozygous ● Duplicated (2x)

Conclusions

● % heterozygosity is neglectable

● Duplicated portion is not neglectable

0

50

100

150

200

250

300

30 50 70 90

31

-mer

vol

um

e M

illio

ns

31-mer frequency

31-mer histogram '001'

FIT

'045'

FIT

'046'

FIT

'053'

FIT

'054'

FIT

'058'

FIT

'072'

FIT

'074'

FIT

Genome size estimates

Genomic K-mer based estimate Ignores differences GC-AT

ratio Underestimation

Nr Species

Est. Size (Mb)

Draft Size (Mb)

%CP

01 SL 723 1.9 Heinz 760

45 SP 749 1.9 46 SP 775 6.3

LA1589 739 53 SG 728 4.4 54 SC 760 6.2 58 SA 830 3.0 72 SH 779 7.1 74 SP 962 8.6

Acknowledgement: Theo Borm

The Tomato Genome Consortium Nature 485, 635–641 (2012)

Optimizing assembly strategy

Checking assebly integrity

Average completeness per 10 contigs: ALL-PATHS (96.62%) CLC-BIO (74.62%)

Heinz dot plot

SL2.40 ch11 – region (1 Mbp)

Status de novo assembly genomes

Status de novo assembly genomes

N50 N90 Longest Shortest Mean Median N

Contigs Total

length

Heinz 1706 reference

16,467,796

3,041,128

42,121,211 2000

242,428

2,847

3,223

781,345,411

S. habrochaites_allpaths

90,424

12,290

990,035 902

43,409

20,461

16,935

735,128,396

S. habrochaites_scaf

515,730

104,925

3,252,897 902

130,475

9,758

5,873

766,277,628

S. pennellii_allpaths

64,671

7,460

627,722 887

27,680

11,008

26,589

735,990,792

S. pennellii_scaf

206,135

38,969

1,269,801 887

49,209

5,932

15,886

781,730,072

S. arcanum_clc

18,651

2,524

241,690 200

2,869

428

290,145

832,461,203

Conclusions

Sequencing completed Quality and coverage threshold satisfied Cleaning resequencing data completed De novo assembly of S. habrochaites and S. pennelli

comparable with tomato reference De novo assembly of S. arcanum in progress Read mapping and SNP analysis finished

And now the fun begins...

Average SNP rate/KB (vs. SL2.40)

Homozygous vs Heterozygous feature rate

Exploring the FW9-2-5 locus (Lin5)

Sucrose synthase gene Cloned from S. pennellii amino acid substitutions:

● 2878 (Asp in LP to Glu in LE)

● 2932 (Asp to Asn) ● 2953 (Val to Leu)

Fridman et al. Proc Natl Acad Sci U S A. 2000 Apr 25;97(9):4718-23.

FW9-2-5 variation (Lin5)

S. galapagense

Needs

Whole genome variant catalogue Annotation for the three wild species genomes Pan genome reconstruction How good is our sampling?

Perspectives

Direct application for Reverse genetics studies ● Use identified allelic variation ● Calculate distance based on all genes?

Better understanding of genome organization ● Improve introgression breeding ● Homozygous vs. hetrerozygous features ● Scan for inversions

Diamond jewelry?

150 tomato genome consortium

Questions

Project site:

● http://www.tomatogenome.net

Phenotype data & Images:

● https://www.eu-sol.wur.nl

SOL100:

● http://solgenomics.net or http://solgenomics.wur.nl

http://www.tomatogenome.net

http://www.eu-sol.wur.nl

http://solgenomics.net

http://solgenomics.wur.nl

Acknowledgments

Data production ● Elio Schijlen ● Bas te Lintel Hekkert

Quality control

● Saulo Aflitos

Data management and assembly ● Sandra Smit ● Jan van Haarst ● Henri van de Geest ● Lars Smits

Project management

● Sander Peters ● Richard Finkers ● Andries Koops

● Huanwen Zhu ● Minling Xiao ● Tao Ma ● Xiaoli Wang

● Jiumeng Min ● Jie Chen ● Xiaoli Wang

● Jianbo Jian ● Yadan Luo ● Li Liao ● Tina(Na) Xu

the tomato genome re-seq project - university of florida - flinkers.pdf · ignores differences...

Documents