indian initiative for tomato genome sequencing tomato finishing workshop t. r. sharma national...

24
INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural Research Institute New Delhi -110012 [email protected]

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCINGTomato Finishing Workshop

T. R. SharmaNational Research Centre on Plant Biotechnology

Indian Agricultural Research InstituteNew Delhi [email protected]

Page 2: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Tomato Genome Sequencing Project

ItalyUSAUSAUSA SpainJapanFranceN.landIndiaChina UKKorea

Page 3: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Capillary Sequencers

ABI-3700MegaBace-1000/4000

Sequence Type

Collection of DNA Seq. data

Page 4: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Data Flow at NRCPB

Page 5: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

(i) rename - renames any number of files from ABI or MegaBACE generated format to St. Louis naming convention,

(ii) fsplit - splits a file containing multiple sequences in fasta format

(iii)fmerge - converts multiple fasta files into a single fasta file

(iv)coverage - calculates the depth of coverage of an assembly by the most stringent method

(v) extract_reads – extracts all the reads from a particular contig or contigs in an assembly,

(vi)comhits - compares two blast outputs stored as text for common hit

(vii)confasta - converts a file of nucleotide sequences containing numbers and/or blank spaces into a sequence fasta file for doing BLAST search

(viii) format2xls - converts sequence fasta files to a tab delimited format

(ix)format2fasta - converts a database stored file into fasta format for further analysis

(x) prefinish96 - an excel macro program which arranges templates in alphabetical order along with their custom primers in a 96 well format

(xi)prefinish384 - a similar excel macro program for template arrangement in 384 well format

Softwares Developed for Performing HTGS Analysis

Page 6: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Sequence gap closer strategies for use

Page 7: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Gap

Single clone area

Single strand areaMultiple clone coverage on both strands

Genome Sequences Types Submitted to GenBank

Gap

1 3 4 2A B E C DHF G

Phase I

1 3 42A B C D E F G H

E E Phase II

1E EPhase III

Custom primers

Page 8: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

DNA Sequence Finishing

Page 9: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Finishing DNA Sequences

• Resolve sequence ambiguities and discrepancies, such that the error rate is less than one in 10,000 bases.

• Provide “double-stranded” coverage for every base:

– minimum of two different clones

– two different directions

– two different chemistries

• Achieve contiguity.

• Delineate vector/insert junctions.

Finishing: is the process of polishing raw sequences,transforming the fragmented rough draft into long, continuous final product without breaks or errors.

GOALS………..

Page 10: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Finishing DNA Sequences -How

Scan assembly to pick linker clones for Tn Seq

custom oligo dye terminatorreverse dye terminator special chem (dGTP) reactionscustom oligo for BAC DNA sequencingPCR amplification of problem areas

Software used: Consed which is a graphical tool for viewing and editing sequence assembly data :chromat_dir, phd_dir, edit_dir

Page 11: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Methods to resolve Seq. Gaps 1.Transposon method

•Identify linker clones•Perform trnasposon insertions•Transform DH10B cells•Pickup atleast 24 white colonies•Prepare template•Seq. all the templates•Add new Seq. data

Linker clones

(New England BioLabs)

Page 12: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Methods to resolve Seq. problems

2.Custom primer method

Design primers

Seq. at least 3 shot gun clones spanning to the region

With same/different chemistry

Add new seq. data - Editing

Identify problem areas Poor quality regionCustom primer

Page 13: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Methods to resolve Seq. problems

3.PCR method

Joining 2 contigs by PCR

Contig 1 Contig 2PCR amplification

Primers

Seq. of PCR products

Cleaning of PCR products

New reads

M 1 2 3 4 5 6 7 8

1 kb -

Page 14: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Sequencing Status, IITGS

Phase 111 = 24Phase 11 =25Phase1 =10Library =9 Total BACs Seq. = 68

Page 15: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

BAC clones in Phase III (IITGS)

S. No.

Map Position

(cM) Acc.# BAC Marker Size (kb)

1 0 AC187148 C05HBa0191B01 CT101 76

2 - AC188781 C05SLm0005B15 - 96

3 - AC204082 C05SLe0086I08 - 130

4 7 AC187538 C05HBa0261K11 C2-At1g60200 155

5 10 AC188778 C05HBa0042B19 cLET-8-B23 117

6 - AC188782 C05SLm0037H06 - 108

7 - AC212301 C05HBa0060G21 CT242 131

8 16 AC194694 C05HBa0058L13 T1592 105

9 - AC212306 C05SLe0066O01 TG441 100

10 - AC209589 C05HBa0145P19 TG432 150Total Seq.=1.168MB

Page 16: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

S. No.

Map Position

(cM) Acc.# BAC Marker Size (kb)

11 - AC212299 C05HBa0003C20 BS4 19

12 - AC209178 C05HBa0168M18 CT167 31

13 - AC225119 C05HBa0207N03 - 50

14 - AC225041 C05SLm0118J18 - 106

15 - AC212305 C05SLe0028N03 - 135

16 - AC212309 C05SLe0122H05 - 90

17 37 AC212304 C05HBa0309L13 C2_At2g01110 130

18 - AC225118 C05HBa0161A14 - 96

19 - AC212312 C05SLm0115G01 C2_At1g24830 142

20 - AC225040 C05SLm0079C22 T1640 125

21 - AC225117 C05HBa0042L17 - 72

22 119 AC186292 C05HBa0251J13 TG185 98

23 115 AC196190 C05HBa0141A12 - 89

24 76 AC212274 C05HBa0135A02 C2Atlg10500 100

BAC clones in Phase III (IITGS)

Total Seq.=1.283MB

Page 17: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

BAC clones on other Chromosomes / Redundant BAC Clones

S. No. Map Position (cM) Acc.# BAC MarkerSize (kb)

1 - AC187540 C07SLm0077G20 - 92

2 - AC187539 C07HBa0179K09 T0876 108

3 22 AC212314 C11HBa0027B05 BS4 168

4 - AC212315 C11SLe0053P22 - 140

5 111 AC182647 C05HBa0006N20 TG69 123Total Seq.=631kb

Total Seq.=3.082MB

Page 18: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Examples of Problematic Regions

Page 19: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Highly misassembled clone C05SLm0050C14

Page 20: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

consensus

Aligned region showing single base mismatch in C05SLm0050C14

Page 21: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Approach to solve the misassembly in C05SLm0050C14

Region yet to be resolved

Manually re-arranging reads on basis of:•Read-pair information of sub-clones.•PCR of different regions within the BAC to reconfirm assembly.•Digestion pattern of BAC obtained from six different restriction enzymes.•Sequence obtained after assembling individual sub-clones following transposition

Current status of C05SLm0050C14

Page 22: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

Misassembly

C05HBa0089M06

Page 23: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

A typical GC rich region

Page 24: INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Tomato Finishing Workshop T. R. Sharma National Research Centre on Plant Biotechnology Indian Agricultural

ACKNOWLEDGEMENTS

All Members of Indian Tomato Genome Sequencing Groupand DBT for Financial Assistance