indian initiative for tomato genome sequencing tomato finishing workshop t. r. sharma national...
Post on 22-Dec-2015
214 views
TRANSCRIPT
INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCINGTomato Finishing Workshop
T. R. SharmaNational Research Centre on Plant Biotechnology
Indian Agricultural Research InstituteNew Delhi [email protected]
Tomato Genome Sequencing Project
ItalyUSAUSAUSA SpainJapanFranceN.landIndiaChina UKKorea
Capillary Sequencers
ABI-3700MegaBace-1000/4000
Sequence Type
Collection of DNA Seq. data
Data Flow at NRCPB
(i) rename - renames any number of files from ABI or MegaBACE generated format to St. Louis naming convention,
(ii) fsplit - splits a file containing multiple sequences in fasta format
(iii)fmerge - converts multiple fasta files into a single fasta file
(iv)coverage - calculates the depth of coverage of an assembly by the most stringent method
(v) extract_reads – extracts all the reads from a particular contig or contigs in an assembly,
(vi)comhits - compares two blast outputs stored as text for common hit
(vii)confasta - converts a file of nucleotide sequences containing numbers and/or blank spaces into a sequence fasta file for doing BLAST search
(viii) format2xls - converts sequence fasta files to a tab delimited format
(ix)format2fasta - converts a database stored file into fasta format for further analysis
(x) prefinish96 - an excel macro program which arranges templates in alphabetical order along with their custom primers in a 96 well format
(xi)prefinish384 - a similar excel macro program for template arrangement in 384 well format
Softwares Developed for Performing HTGS Analysis
Sequence gap closer strategies for use
Gap
Single clone area
Single strand areaMultiple clone coverage on both strands
Genome Sequences Types Submitted to GenBank
Gap
1 3 4 2A B E C DHF G
Phase I
1 3 42A B C D E F G H
E E Phase II
1E EPhase III
Custom primers
DNA Sequence Finishing
Finishing DNA Sequences
• Resolve sequence ambiguities and discrepancies, such that the error rate is less than one in 10,000 bases.
• Provide “double-stranded” coverage for every base:
– minimum of two different clones
– two different directions
– two different chemistries
• Achieve contiguity.
• Delineate vector/insert junctions.
Finishing: is the process of polishing raw sequences,transforming the fragmented rough draft into long, continuous final product without breaks or errors.
GOALS………..
Finishing DNA Sequences -How
Scan assembly to pick linker clones for Tn Seq
custom oligo dye terminatorreverse dye terminator special chem (dGTP) reactionscustom oligo for BAC DNA sequencingPCR amplification of problem areas
Software used: Consed which is a graphical tool for viewing and editing sequence assembly data :chromat_dir, phd_dir, edit_dir
Methods to resolve Seq. Gaps 1.Transposon method
•Identify linker clones•Perform trnasposon insertions•Transform DH10B cells•Pickup atleast 24 white colonies•Prepare template•Seq. all the templates•Add new Seq. data
Linker clones
(New England BioLabs)
Methods to resolve Seq. problems
2.Custom primer method
Design primers
Seq. at least 3 shot gun clones spanning to the region
With same/different chemistry
Add new seq. data - Editing
Identify problem areas Poor quality regionCustom primer
Methods to resolve Seq. problems
3.PCR method
Joining 2 contigs by PCR
Contig 1 Contig 2PCR amplification
Primers
Seq. of PCR products
Cleaning of PCR products
New reads
M 1 2 3 4 5 6 7 8
1 kb -
Sequencing Status, IITGS
Phase 111 = 24Phase 11 =25Phase1 =10Library =9 Total BACs Seq. = 68
BAC clones in Phase III (IITGS)
S. No.
Map Position
(cM) Acc.# BAC Marker Size (kb)
1 0 AC187148 C05HBa0191B01 CT101 76
2 - AC188781 C05SLm0005B15 - 96
3 - AC204082 C05SLe0086I08 - 130
4 7 AC187538 C05HBa0261K11 C2-At1g60200 155
5 10 AC188778 C05HBa0042B19 cLET-8-B23 117
6 - AC188782 C05SLm0037H06 - 108
7 - AC212301 C05HBa0060G21 CT242 131
8 16 AC194694 C05HBa0058L13 T1592 105
9 - AC212306 C05SLe0066O01 TG441 100
10 - AC209589 C05HBa0145P19 TG432 150Total Seq.=1.168MB
S. No.
Map Position
(cM) Acc.# BAC Marker Size (kb)
11 - AC212299 C05HBa0003C20 BS4 19
12 - AC209178 C05HBa0168M18 CT167 31
13 - AC225119 C05HBa0207N03 - 50
14 - AC225041 C05SLm0118J18 - 106
15 - AC212305 C05SLe0028N03 - 135
16 - AC212309 C05SLe0122H05 - 90
17 37 AC212304 C05HBa0309L13 C2_At2g01110 130
18 - AC225118 C05HBa0161A14 - 96
19 - AC212312 C05SLm0115G01 C2_At1g24830 142
20 - AC225040 C05SLm0079C22 T1640 125
21 - AC225117 C05HBa0042L17 - 72
22 119 AC186292 C05HBa0251J13 TG185 98
23 115 AC196190 C05HBa0141A12 - 89
24 76 AC212274 C05HBa0135A02 C2Atlg10500 100
BAC clones in Phase III (IITGS)
Total Seq.=1.283MB
BAC clones on other Chromosomes / Redundant BAC Clones
S. No. Map Position (cM) Acc.# BAC MarkerSize (kb)
1 - AC187540 C07SLm0077G20 - 92
2 - AC187539 C07HBa0179K09 T0876 108
3 22 AC212314 C11HBa0027B05 BS4 168
4 - AC212315 C11SLe0053P22 - 140
5 111 AC182647 C05HBa0006N20 TG69 123Total Seq.=631kb
Total Seq.=3.082MB
Examples of Problematic Regions
Highly misassembled clone C05SLm0050C14
consensus
Aligned region showing single base mismatch in C05SLm0050C14
Approach to solve the misassembly in C05SLm0050C14
Region yet to be resolved
Manually re-arranging reads on basis of:•Read-pair information of sub-clones.•PCR of different regions within the BAC to reconfirm assembly.•Digestion pattern of BAC obtained from six different restriction enzymes.•Sequence obtained after assembling individual sub-clones following transposition
Current status of C05SLm0050C14
Misassembly
C05HBa0089M06
A typical GC rich region
ACKNOWLEDGEMENTS
All Members of Indian Tomato Genome Sequencing Groupand DBT for Financial Assistance