splicing exons: a eukaryotic challenge to gene prediction ian mccoy

12
Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Upload: marion-strickland

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Gene Prediction Genes must be identified to make the genome useful Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes.

TRANSCRIPT

Page 1: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Splicing Exons: A Eukaryotic Challenge to Gene Prediction

Ian McCoy

Page 2: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy
Page 3: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Gene Prediction

Genes must be identified to make the genome useful

Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes.

Page 4: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

A Serious Complication

Only 3% of the human genome contains genes

Page 5: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Similarity-Based Approach

Instead of looking for a gene for a target protein directly, use a protein in a related organism.

Find all local similarities between a genomic sequence and the target protein sequence.

All substrings that exhibit a certain level of similarity will be called putative exons.

Page 6: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Exon-Chaining Problem

1. Use brute force to generate a set of putative exons.

2. Represent each exon with three parameters (l,r,w).

3. Find a maximum set of nonoverlapping putative exons.

Page 7: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Formulate as Graph Problem

Create a graph G with 2n verticies: n vertices are starting(left) positions of exons and n vertices are ending(right) positions of exons.

The set of left and right interval ends is sorted into increasing order.

There are edges between each li and ri of weight wi for I from 1 to n; and 2n-1 additional edges of weight 0 connecting adjacent vertices.

Page 8: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Input: A set of weighted intervals (putative exons)

Output: The length of the maximum chain of intervals from this set

Page 9: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Dynamic Programming Algorithm

ExonChaining (G, n) //Graph, number of intervals1 for i ← 1 to 2n2 si ← 03 for i ← 2 to 2n4 if vertex vi in G corresponds to right end of the interval I5 j ← index of vertex for left end of the interval I6 w ← weight of the interval I7 sj ← max {sj + w, si-1}8 else9 si ← si-1

10 return s2n

Page 10: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy
Page 11: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Shortcomings

A large number of short exons will decrease the efficacy of our method for finding putative exons.

Exons may be out of order.

Page 12: Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy

Any Questions?

Jones, Neil C., and Pavel A. Pevzner. An Introduction to Bioinformatics Algorithms. Cambridge: MIT Press, 2004. (p.200-203)