and thanks to eli rusman
DESCRIPTION
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter +. and thanks to Eli Rusman. * Affymetrix + UC Berkeley Mathematics Dept. Conservation of alternative splicing between human and mouse. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/1.jpg)
HMM Sampling and Applications toGene Finding and Alignment
European Conference on Computational Biology 2003
Simon Cawley* and Lior Pachter+
and thanks to Eli Rusman
* Affymetrix+ UC Berkeley Mathematics Dept
![Page 2: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/2.jpg)
Conservation of alternative splicing between human and
mouse• Modrek and Lee: 40-60% of human genes
have alternative splice forms. Nature Genetics 2002.
• Nurtdinov et al. 75% of human alternative splice forms are conserved in mouse.
Human Molecular Genetics 2003.
Can we develop ab-initio methods for detecting conserved alternative splice sites?
![Page 3: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/3.jpg)
A
A
C
A
T
T
A
G
AA G A T T A C C A C A
Sequence Alignment
![Page 4: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/4.jpg)
A
A
C
A
T
T
A
G
AA G A T T A C C A C A
max
Finding the optimal alignment
![Page 5: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/5.jpg)
ai,j = w ai-1,j + w ai,j-1 + si,j ai-1,j-1
A
A
C
A
T
T
A
G
AA G A T T A C C A C A
Alignment forward variables for positions [1,i] and [1,j]
in each sequence
Match/mismatch probabilities forpositions i,j in each sequence
gap probabilities
Sampling to find alternative alignments
![Page 6: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/6.jpg)
Linear Space Sampling
Sequences length T,U
To obtain k samples
Time complexity: O(TU+k(T+U))
Memory requirements: O(T+U)
Hirschberg’s divide and conquer algorithm
Time complexity: O(TU)
Memory requirements: O(T+U)
![Page 7: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/7.jpg)
Alternative Splicing in Mammalian Genomes
pre-mRNA
TRANSLATION
SPLICING
Protein I
ALTERNATIVE SPLICING
Protein II
TRANSLATION
![Page 8: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/8.jpg)
M. Alexandersson, S. Cawley, L. Pachter, SLAM- Cross-species gene finding and alignment with a
generalized pair hidden Markov model, Genome Research, 13 (2003) p 496-502
Cross-species simultaneous gene finding
and alignment
![Page 9: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/9.jpg)
Modeling gene features
5’ 3’
Exon 1 Exon 2 Exon 3Intron 1 Intron 2
CNS CNS CNS
[human]
[mouse]
![Page 10: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/10.jpg)
The SLAM hidden Markov model
![Page 11: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/11.jpg)
SLAM components• Splice site detector
– VLMM
• Intron and intergenic regions– 2nd order Markov chain
– independent geometric lengths
• Coding sequence– PHMM on protein level
– generalized length distribution
• Conserved non-coding sequence– PHMM on DNA level
![Page 12: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/12.jpg)
SLAM input and output
• Input:– Pair of homologous sequences.
• Output:– CDS and CNS predictions in both sequences.– Protein predictions.– Protein and CNS alignment.
![Page 13: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/13.jpg)
http://bio.math.berkeley.edu/slam/
![Page 14: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/14.jpg)
Input:
![Page 15: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/15.jpg)
Output:
![Page 16: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/16.jpg)
![Page 17: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/17.jpg)
![Page 18: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/18.jpg)
Methodology for identifying alternative splice sites
• Compiled SLAM gene predictions for the human, mouse and rat genomes.
• Identified a set of 3400 human/mouse/rat gene triples with consistent predictions from hs/mm and hs/rn analyses.
• For each triple, sampled sub-optimal parses from hs/mm and hs/rn runs
• Collected alternative exons (non-Viterbi exons) that appeared in both the hs/mm and hs/rn runs
• Examined overlap with RefSeq genes, mRNAs and ESTs
![Page 19: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/19.jpg)
SLAM whole genome predictions
• Built a whole genome homology map (Colin Dewey)http://baboon.math.berkeley.edu/~cdewey/homologyMaps/
• Pre-aligned the homologous blocks to reduce the SLAM search space (Nicolas Bray using AVID)
http://baboon.math.berkeley.edu/mavid/http://hanuman.math.berkeley.edu/kbrowser/
• Ran SLAM on the resulting blockshttp://bio.math.berkeley.edu/slam/mouse/http://bio.math.berkeley.edu/slam/rat/
![Page 20: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/20.jpg)
[human]
[mouse]
[rat]
![Page 21: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/21.jpg)
![Page 22: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/22.jpg)
![Page 23: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/23.jpg)
![Page 24: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/24.jpg)
Comparing predicted alternative exons to ESTs and
mRNAshuman/mouse/rat alternative
exonshuman/mouse alternative
exons
EST/mRNANo
EST/mRNA EST/mRNANo
EST/mRNA
Gene count 29 344 461 3296
Alt. Exon count 29 441 557 7240
Shifties 28 209 262 2227
Newbies 1 232 295 5013
![Page 25: and thanks to Eli Rusman](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813836550346895d9fe319/html5/thumbnails/25.jpg)
Conclusions
• Sampling is memory efficient, fast, and should be used routinely for alignment applications.
• Conserved alternative splice forms can be detected ab-initio.
• The extent of alternative splicing conservation is currently unclear. Sampling provides an alternative approach for investigating this problem- one that is not sensitive to biases in EST data.
• Problem: design effective and scalable validation strategies for alternative splice sites.