frédéric choulet a pseudomolecule of 774 mb: the 3b experience inra gdec – clermont-ferrand,...

17
Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France

Upload: amina-oakland

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Frédéric CHOULET

A pseudomolecule of 774 Mb: the 3B experience

INRA GDEC – Clermont-Ferrand, France

3B

Sequenced physical map

#MTP BACs 8452

3B MTP-BAC sequencing

#BAC pools 922

#Roche 8 kb MP lib. 922

bp coverage (Roche/454) 36x

BAC-ends (Sanger) 42,551

Whole Genome Prof. tags 327,282

Whole 3B shotgun (Illumina) 82x

#BACs

#BAC-contigs

Physical map

132,000 (19x)

1282

#MTP-BACs 8452

3B900 Mb

3B physical map

ACGTAGACTACA

3B

Assembly and scaffolding 3B-v1

16,136 scaff

1,040 Mb

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding

Integration of BAC-end match positions

Parsing of MP read positions

scaff00001 scaff00013

scaff00024 scaff00008scaff00011

scaff00007

scaff00005

3B-v1

16,136 scaff

1,040 Mb

18% Ns

3B-v3

4,999 scaff

992 Mb

13% Ns

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding

3B-v3

4,999 scaff

992 Mb

13% Ns

3B-v1

16,136 scaff

1,040 Mb

o Gap filling o Seq. error corrections

JM. Aury, A. Couloux (Genoscope)

Illumina readsWhole 3B Shotgun

109,914 gaps filled 126,290 bases corrected (error rate: 0.1%)

18% Ns

3B-v4

-

-

8% Ns

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding 3B-v1

16,136 scaff

1,040 Mb

o Gap filling o Seq. error corrections

JM. Aury, A. Couloux (Genoscope)

o Redundancy removal and scaffold mergingS. Theil (INRA GDEC)

Pool_A

Pool_B

ctg1

ctg2

2,808 scaff

833 Mb

3B-v443

scaffAssembler.pl

3B-v4

4,999 scaff

992 Mb

redundancy:160 Mb

Search for shared TE-junctions

Ordering scaffolds 2,808 scaff

833 Mb

3B-v443

o SNP discovery

BaitTE

DNA captured from 10 genotypes

gene

52,265 baits isbpProbeDesign.pl 39,077 SNPs

SureSelect® seq. capture (E. Paux, N. Cubizolles, E. Rey)

Ordering scaffolds

o SNP discovery

Genetic mapping (P. Sourdille)

+ Neighbor map: 3865 markers

LD mapping (F. Balfourier)

• Anchor map: 384 indiv Cs x Renan

o Genotyping mapping pop

3,075 SNPs

• 367 lines from a core-collection

3B

Ordering scaffolds

genetic map

44.8 cM152 scaffolds

LD map

19 LD blocks

366 bins0 cM 133 cM

554 bins

64 markers at the same genetic position

Linkage Disequilibrium

Ordering scaffolds

pseudomolBuilder.pl

1358 scaff

774 Mb

pseudomolecule

unlocalized

1450 scaff

59 Mb

93%

7%

N N N N N N N N

o SNP discovery

o Genotyping mapping pop

o Integration of phys. map info

3cM 0 1 2 3 3 4 5 6

A B C D E

o orientation unknown: 48% of the seq.

o micro-order unknown: 554 bins / 1358 scaff

? ?

o RH mapo Optical mapo Long reads

Future Improvements

?

• 7264 protein coding genes

TRIANNOT

• 234,606 TEs

CLARI-TE

774 Mb

Annotation

Bioinformatics

Scaffolding/pseudomolecule construction

scaffAssembler.pl

Annotation

gapCloser ssrFinishing

triAnnot (new modules: filtering, pseudogenes, transfer annotation)

clari-TE & clari-TE-lib

Data management gowDB (Bio::DB::seqFeatureStore)

Gbrowse @ URGI

pseudomolBuilder.pl

isbpProbeDesign.pl

Assembly Newbler

Sébastien TheilNatasha GloverJosquin DaronLise Pingault

Pierre SourdilleEtienne Paux

Philippe Leroy

Jacques Le GouisNicolas Guilhot

Aurélien Bernard

Nelly Cubizolles

Catherine Feuillet

François Balfourier

M. AlauxL. CoudercV. JamillouxH. Quenesville

URGI

H. BergesA. Bellec

CNRGV

BIA

A. AlbertiV. BarbeJ. PoulainC. DurandS. MangenotJM. AuryA. CoulouxP. Wincker

Genoscope

J. DolezelJ. Safar

IEB

K. Vandepoele

K. Mayer et al. P. SchnableS. RounsleyD. Ware

C. Gaspin

SAB

VIB

MIPS

Acknowledgments

Hélène Rimbert

TGACJ. Rogers, M. Caccamo et al.

J. RogersK. Eversole