a fast hybrid short read fragment assembly algorithm
DESCRIPTION
A Fast Hybrid Short Read Fragment Assembly Algorithm. Introduction. Second-generation DNA technologies Traditional : Sanger shotgun techniques New techniques(2007 & 2008): SSAKE, UCAKE and SHARCGS --based on greedy extension Edena, Velvet, Euler-SR --based on graph. Taipan Method: Two steps. - PowerPoint PPT PresentationTRANSCRIPT
A Fast Hybrid Short Read Fragment Assembly Algorithm
Introduction
• Second-generation DNA technologies
• Traditional : Sanger shotgun techniques
• New techniques(2007 & 2008):
• SSAKE, UCAKE and SHARCGS--based on greedy extension
• Edena, Velvet, Euler-SR--based on graph
Taipan Method: Two steps
• 1. Greedy Extension• iteratively extended by one base at a time both in
3’ direction and 5’ direction
• 2. Graph-based Method• to assembly the constructed contig from previous
step.
Example• Usage:taipan -f {inputfilename} -k {minimal_overlap} [-t {threshold}] [-o {seed_occ}] [-v
{verbose}] [-c {min_contig_length}]
• Result:
Optimal spliced alignments of short sequence reads
Fabio De Bona
Bioinfromatics, 2008
Genome VS Transcriptome
• Analysis sequence reads from genomic DNA
Sequence assemble
Align them to the genome• Transcriptome analysis
First align the single reads to the genome
Then merges the alignments to infer gene structures.
Genome VS Transcriptome
• Reconstruct the whole genome from cDNA data
• Reconstruct the transcriptome from EST data (transcripted cDNA)
DNA
Problem Formulation
Limitation:
1 read length of the NG is relatively small.
2 read error rate(assuming 5%)
DNA
General Description
Smith-Waterman
– Quality Score
– Slicing Site Info
– Intron Length
Method
1. Original
2. With Quality Score
3. With Slicing Info
4. With Intron
Test Data• 10 000 sequences with known alignments• three different scorings
1.quality information2.splice site predictions3. intron length