splicing exons: a eukaryotic challenge to gene prediction ian mccoy
DESCRIPTION
Gene Prediction Genes must be identified to make the genome useful Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes.TRANSCRIPT
Splicing Exons: A Eukaryotic Challenge to Gene Prediction
Ian McCoy
Gene Prediction
Genes must be identified to make the genome useful
Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes.
A Serious Complication
Only 3% of the human genome contains genes
Similarity-Based Approach
Instead of looking for a gene for a target protein directly, use a protein in a related organism.
Find all local similarities between a genomic sequence and the target protein sequence.
All substrings that exhibit a certain level of similarity will be called putative exons.
Exon-Chaining Problem
1. Use brute force to generate a set of putative exons.
2. Represent each exon with three parameters (l,r,w).
3. Find a maximum set of nonoverlapping putative exons.
Formulate as Graph Problem
Create a graph G with 2n verticies: n vertices are starting(left) positions of exons and n vertices are ending(right) positions of exons.
The set of left and right interval ends is sorted into increasing order.
There are edges between each li and ri of weight wi for I from 1 to n; and 2n-1 additional edges of weight 0 connecting adjacent vertices.
Input: A set of weighted intervals (putative exons)
Output: The length of the maximum chain of intervals from this set
Dynamic Programming Algorithm
ExonChaining (G, n) //Graph, number of intervals1 for i ← 1 to 2n2 si ← 03 for i ← 2 to 2n4 if vertex vi in G corresponds to right end of the interval I5 j ← index of vertex for left end of the interval I6 w ← weight of the interval I7 sj ← max {sj + w, si-1}8 else9 si ← si-1
10 return s2n
Shortcomings
A large number of short exons will decrease the efficacy of our method for finding putative exons.
Exons may be out of order.
Any Questions?
Jones, Neil C., and Pavel A. Pevzner. An Introduction to Bioinformatics Algorithms. Cambridge: MIT Press, 2004. (p.200-203)