robust alignment of drosophila genomes lior pachter eecs joint colloquium, october 5th 2005
DESCRIPTION
Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005. What is genomics?. GGGCTGGGCGAGTATCTCTTCGAAAGGCTCACTCTCAAGCACGACTAAGAGCCTTCTGAGC. GLGEYLFERLTLKHD *. What is genomics?. GGGCTGGGCGAGTATCTCTTCGAAAGGCTCACTCTCAAGCACGACTAA GAGCCTTCTGAGC. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/1.jpg)
Robust Alignment of Drosophila Genomes
Lior PachterEECS Joint Colloquium, October 5th 2005
![Page 2: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/2.jpg)
GGGCTGGGCGAGTATCTCTTCGAAAGGCTCACTCTCAAGCACGACTAAGAGCCTTCTGAGC. . . . . . . . . . . .
What is genomics?. . .
![Page 3: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/3.jpg)
GGGCTGGGCGAGTATCTCTTCGAAAGGCTCACTCTCAAGCACGACTAAGAGCCTTCTGAGC. . . . . . . . . . . .
What is genomics?
GLGEYLFERLTLKHD*. . . .
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
. . .
![Page 4: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/4.jpg)
What is genomics?
TTCCTTAGACTCTTAGAAAGTACCTCAAAAACGAAATGCGAACAC . . . . . . . . .
![Page 5: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/5.jpg)
What is genomics?
TTCCTTAGACTCTTAGAAAGTACCTCAAAAACGAAATGCGAACAC . . . . . . .
![Page 6: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/6.jpg)
What is genomics?
TTCCTTAGACTCTTAGAAAGTACCTCAAAAACGAAATGCGAACAC . . . . . . . ATGGAGT
....
. . .. microRNA
![Page 7: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/7.jpg)
What is comparative genomics?
TTCCCTAG--------CAAGTACCTCA------------------TTCCCTAG--------CAAGTACCTCA------------------TTCCCTAG--------CAAGTACCTCA------------------TTCCTTAGACTCTTAGCAAGTACCTCA------------------TTCCTTAGACTCTTAGAAAGTACCTCAAAAACGAAATGCGAACACGACTCT----TTTTAGCAAGTACCTCAAAATATTTAATTAAA-AC ACTCTT----TTTTAGCAAGTACCTCAAGAATTACAATTAAATAT
TTCCTTAGACTCTTAGAAAGTACCTCAAAAACGAAATGCGAACAC
Grun et al. microRNA target predictions across seven Drosophila species and comparison to mammalian targets, PloS Computational Biology, June 2005Lall et al. A genome wide map of conserved microRNA targets in C. Elegans, submitted to Cell, 2005.
ATGGAGT.
...
. . .. let-7
![Page 8: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/8.jpg)
The Drosophila Genome Project
• 1911 Genetic Mapping in Drosophila• Sturtevant and Morgan
• • •
• 2000 Drosophila melanogaster genome sequenced• Celera and LBNL publish Drosophila genome in Science
• 2003 Proposal for Drosophila as a model system for comparative genomics• Clark, Gibson, Kaufman, McAllister, Myers, O’Grady
• 2005 Twelve Drosophila genomes sequenced• Consortium involving Agencourt, Broad Institute, Baylor College Medicine, Washington University St. Louis and the Venter Institute.
![Page 9: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/9.jpg)
Sequencing & Assembly Status Sequencing Center~3-fold WGS of w501 strain & 1-foldcoverage of 6 other strains complete(2 assemblies currently available; deepercoverage of w501 strain expected Fall ‘05)
Washington Univ.(WUGSC)
~3-fold WGS complete (assembly to bereleased by Sept 1)
Broad Institute
Release 4.2: 118.4 Mb with 23 gapsremaining (Release 5 in Fall 2005)
Celera/BDGP
~6-fold WGS complete (assembly inGenBank)(additional coverage - automated sequenceimprovement expected Fall ‘05)
Washington Univ(WUGSC).
~12-fold WGS complete & assembled Agencourt
~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Baylor College of
Medicine (BCM)~4-fold WGS complete & assembled Broad Institute
~6-fold WGS (BAC paired ends currentlybeing sequenced; assembly to be released bySept 15)
Venter Institute(JCVI)
~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Agencourt~8-fold WGS complete (assembly to bereleased by Sept 15)
Agencourt
![Page 10: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/10.jpg)
Drosophila Projects
• Transposable Element Annotation• A. Caspi and L. Pachter, Identification of transposable elements using multiple alignments of related genomes, Genome Research, in press.
• Multiple Sequence Alignment• C. Dewey and L. Pachter, Whole Genome Mapping, in preparation.
• A.S. Schwartz, E.W. Myers and L. Pachter, Alignment metric accuracy, submitted.
• N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004), p 693--699.
• Gene Finding• S. Chatterji and L. Pachter, Multiple organism gene finding by collapsed Gibbs sampling, Journal of Computational Biology, 12 (2005), p 599--608.
• S. Chatterji and L. Pachter, GeneMapper: Evidence based multiple organism gene finding, in preparation.
![Page 11: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/11.jpg)
Drosophila Projects
• Transposable Element Annotation• A. Caspi and L. Pachter, Identification of transposable elements using multiple alignments of related genomes, Genome Research, in press.
• Multiple Sequence Alignment• C. Dewey and L. Pachter, Whole Genome Mapping, in preparation.
• A.S. Schwartz, E.W. Myers and L. Pachter, Alignment metric accuracy, submitted.
• N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004), p 693--699.
• Parametric alignment• Gene Finding
• S. Chatterji and L. Pachter, Multiple organism gene finding by collapsed Gibbs sampling, Journal of Computational Biology, 12 (2005), p 599--608.
• S. Chatterji and L. Pachter, GeneMapper: Evidence based multiple organism gene finding, in preparation.
![Page 12: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/12.jpg)
Available Drosophila whole genome multiple alignments
• MAVID• http://hanuman.math.berkeley.edu/kbrowser
• MULTIZ• http://genome.ucsc.edu/ (currently no D. erecta)
![Page 13: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/13.jpg)
DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **
Alignment of an exon
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC
*** **
Alignment of an intron
![Page 14: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/14.jpg)
DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **
Alignment of an exon
Alignment of an introndroAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *
droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **
![Page 15: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/15.jpg)
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG
dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC
How is an alignment made from the sequences?
>dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC>dp3.chr4_group3CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC
?
Given two sequences of lengths n,m:
Note that the length of an alignment is at least max(n,m) and at most n+m.
n=50
m=62
![Page 16: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/16.jpg)
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG
dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC
DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTAC
Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).
![Page 17: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/17.jpg)
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG
dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC
DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTAC
Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).
#M=31, #X=22, #G=3, #S=12
#M=27, #X=18, #G=3, #S=28
2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary.
This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the book Algebraic Statistics for Computational Biology.
![Page 18: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/18.jpg)
The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points:
(22,3,12) (18,3,28)
In the example of our two sequences there are 379522884096444556699773447791552717765633different alignments, but only53890 different summaries. So we don’t need to plot that many points.
But 53890 is still quite a large number. Fortunately, there are only 69 vertices on the convex hull of the 53890 points.
That is something we can draw…
![Page 19: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/19.jpg)
>melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC
For the sequences:
49 #x=24, #S=10, #G=2
There are eight alignments that have this summary.
the alignment polytope is:
![Page 20: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/20.jpg)
mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC
mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC
mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC
mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC
mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC
mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC
mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC
mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC
![Page 21: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/21.jpg)
mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGACpse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC
Consensus at a vertex
![Page 22: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/22.jpg)
The vertices of the polytope have special significance.
Given parameters for a model, e.g. the default parameters for MULTIZ:
M = 100, X = -100, S = -30, G = -400
the summary is the result of maximizing the linear form
-200*(#X)-400*(#G)-80*(#S)
over the polytope.
Thus, the vertices of the polytope correspond to optimal alignments.
49 #x=24, #S=10, #G=2
![Page 23: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/23.jpg)
What is usually done, is that a single set of parameters is specified (M = 100, X = -100, S = -30, G = -400 is a standard default) and then the optimal vertex is identified using dynamic programming. An alignment optimal for the vertex is then selected. The running time of the algorithm is O(nm) [Needleman-Wunsch, 1970, Smith-Waterman, 1981] and it requires O(n+m) space [Hirschberg 1975] .
Standard scoring schemes are:
Parameters Model
M,X,S Jukes-Cantor with linear gap penalty
M,X,S,G Jukes-Cantor with affine gap penalty M,XTS,XTV,S,G Kimura-2 parameter with affine gap
penalty
Needleman-Wunsch Alignment
![Page 24: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/24.jpg)
Available Drosophila whole genome multiple alignments
• MAVID• http://hanuman.math.berkeley.edu/kbrowser
• MULTIZ• http://genome.ucsc.edu/ (currently no D. erecta)
![Page 25: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/25.jpg)
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC
*** **
N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699
MAVID
![Page 26: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/26.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Needleman-Wunsch
![Page 27: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/27.jpg)
droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *
droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **
Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p 708--715
MULTIZ
![Page 28: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/28.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Needleman-Wunsch
![Page 29: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/29.jpg)
droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *
droAna1.2448876 -----ACTTAC dm2.chr2L TATGGACTCACdroMoj1.contig_2959 TATTGACTCACdp3.chr4_group3 TGT--ACTTACdroSim1.chr2L TATGGACTCACdroVir1.scaffold_6 GGTCCACTCACdroYak1.chr2L CATAAACTCAC *** **
![Page 30: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/30.jpg)
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC
*** **
![Page 31: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/31.jpg)
droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *
droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **
![Page 32: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/32.jpg)
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC
*** **
![Page 33: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/33.jpg)
One (possibly wrong) alignment is not enough: the history of parametric
inference• 1992: Waterman, M., Eggert, M. & Lander, E. • Parametric sequence comparisons, Proc. Natl. Acad. Sci. USA 89, 6090-6093
• 1994: Gusfield, D., Balasubramanian, K. & Naor, D. • Parametric optimization of sequence alignment, Algorithmica 12, 312-326.
• 2003: Wang, L., Zhao, J. • Parametric alignment of ordered trees, Bioinformatics, 19 2237-2245.
• 2004: Fernández-Baca, D., Seppäläinen, T. & Slutzki, G. • Parametric Multiple Sequence Alignment and Phylogeny Construction, Journal of Discrete Algorithms, 2 271-287.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
XPARAL by Kristian Stevens and Dan Gusfield
![Page 34: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/34.jpg)
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
Mathematics and Computer Science
• Parametric alignment in higher dimensions.• Faster new algorithms.• Deeper understanding of alignment polytopes.
Biology
• Whole genome parametric alignment.• Biological implications of alignment parameters.• Alignment with biology rather than for biology.
![Page 35: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/35.jpg)
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
Mathematics and Computer Science
• Parametric alignment in higher dimensions.• Faster new algorithms.• Deeper understanding of alignment polytopes.
Biology
• Whole genome parametric alignment.• Biological implications of alignment parameters.•
CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGCTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTCTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT
analysis
![Page 36: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/36.jpg)
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and
Kevin Woods
Mathematics and Computer Science
• Parametric alignment in higher dimensions.• Faster new algorithms.• Deeper understanding of alignment polytopes.
Biology
• Whole genome parametric alignment.• Biological implications of alignment parameters.•
CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGCTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTCTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT
analysis
![Page 37: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/37.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
computational geometry
![Page 38: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/38.jpg)
(#X #S #G)[#alignments]40 (15,16,16)[1080]41 (17,30,2)[4] 42 (18,14,5)[4]43 (18,16,4)[56]44 (20,10,6)[16]45 (20,10,7)[24]46 (23,8,6)[6]47 (23,8,8)[165]48 (24,8,3)[38]49 (24,10,2)[8]50 (25,8,2)[24]51 (25,62,3)[2]52 (28,48,2)[1]53 (29,8,1)[6]
Finding the polytope is called parametric inference.This polytope took 3 seconds to compute using the beneath-beyond method [Grünbaum, Convex Polytopes, 1967].
![Page 39: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/39.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 40: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/40.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 41: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/41.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 42: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/42.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 43: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/43.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 44: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/44.jpg)
>melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC
Associated to every pair of sequences is a polynomial built from the “summaries” of the alignments.
49 #x=24, #S=10, #G=2
corresponds to the monomial 8X24S10G2
For example:
How do we build the polytope for ?
![Page 45: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/45.jpg)
NPi,j = S*NPi-1,j+S*NPi,j-1+(X or M)*NPi-1,j-1
A
A
C
A
T
T
A
G
AA G A T T A C C A C A
Newton polytope forpositions [1,i] and [1,j]
in each sequence
Convex hull of union Minkowski sum
Polytope propagation
![Page 46: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/46.jpg)
Complexity of polytope propagation
Theorem: The number of vertices of an alignment polytope for two sequences of length n and m is O((n+m)d(d-1)/(d+1)) where d is the number of free parameters in the scoring scheme.
Examples: Parameters Model Vertices M,X,S Jukes-Cantor with linear gap penaltyO(n+m)2/3
M,X,S,G Jukes-Cantor with affine gap penaltyO(n+m)3/2
M,XTS,XTV,S,G K2P with affine gap penalty O(n+m)12/5
L. Pachter and B. Sturmfels, Parametric inference for biological sequence analysis, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16138--16143.L. Pachter and B. Sturmfels, Tropical geometry of statistical models, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16132--16137.
![Page 47: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/47.jpg)
Inference functions
Definition: Given two integers n and m and a scoring scheme for sequence alignment, an inference function assigns to every pair of sequences of lengths n and m respectively, an (optimal) alignment.
Remark: The number of inference functions could, in principle, be
doubly exponential in n+m. This is because the number of alignments is the Delannoy number D(n,m), which is
exponentialin n+m, and the number of sequence pairs is 4n+m.
![Page 48: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/48.jpg)
Few inference functions theorem
Theorem (S. Elizalde 2005): The number of inference functions for two parameter alignment model with two sequences of length n is (n2).
Proof (outline):
1. The number of inference functions is the number of vertices of the Minkowski sum of the Newton polytopes of the observations.
2. The Newton polytopes are all lattice polytopes, and therefore have few non-parallel edges.
3. The number of vertices of the Minkowski sum is at most
where m is the number of non-parallel edges and d is the dimension of the polytopes.
![Page 49: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/49.jpg)
Algebraic Statistics -- A language for unifying and developing many of the algorithms for biological
sequence analysis --
• The few inference functions theorem
• Polytope propagation
• Phylogenetic tree reconstruction
• Evolutionary models
• Maximum likelihood estimation
• Mutagenic tree models
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 50: Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005](https://reader036.vdocuments.site/reader036/viewer/2022062301/56813d27550346895da6ecd9/html5/thumbnails/50.jpg)
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
ATCCAGAAGTCTAGTATACATCTCAAAATTCATGCATCTGGCCGGGCACAGTGGCTCACACCTGCAATCCCAGCACTTTGGGAGGCCGAGGTGGGTGGATTACCTGAGGTCAGGAGTTTAAGACCAGCCTGGCCAACATGGTAAAACCCCATCTCTACTAAAAATACAAGTATTAGCCAGGCATTGTGGCAGGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAAAATCACTTGAACCGGGAGGCGGAGGTTGGAGTGAGCTGAGATCGTGCTACCGCACTCCATGCACTCTAGCCTGGGCAACAGAACGAGATGCTGTCACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAAATTCTCACATCTAAAACAGAGTTCCTGGTTCCATTCCTGCTTCCTGCCTTTCCCACTCCCCCATATTCCCTACCATGCCTTCTTCATCTAATTTAATATTACTAACAAGATCTATTGTTCAAGCCAAAACCCAAGTGTCACTCCTTCAATTTCTCTTTACCTTATCCTCCAAATTTAATCCATTAGCAAGTCCTCTCTTCAAACCCATCCCAAACCAACCTTGTTTTTAACCATCTCCACACCACCAATTACCACAAGGATAAAATCTGAATTCCTTACCACCAAATACTATGTGATCTGGCCCTCATCTATGACCTTCTCCCATTCCTTGTGTAATCTCTGCCTCCACACATAATTTGCAAATTACTCCAGCTACACTGGCCTATTATTATTATTATTATTATTTTTGAGACGGAGTCTTGCTCTTTCGCCCAGCCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAATCTCCGCCTCCTGGGTTCAAGCGATTCTCCTGCCCCAGCCTCCCAAGTAGCTGTGATTACAGGCACATGCCACCATTCCCAGCTAATTTTTTTTTGTTTTTGAGATGGAGTTTCACTCTTGTTGCCCAGGCTGGAGTGCAATGGTGCGATCTCAGCTCACCACAACCTCCACCTCCCGGGTTGATGAAGTGATTCTCTTGTCTCAGCCTCCCGTGTAGCTGGGATTAGAGGCACGCGCCACCACGCTGGGCAAATTTTTGTATTTTTAGTAGAGACAGGGTTTCTACCTCAGTGATCTGTCCGCCTTGACCTCCCAAAGTGCTGGGATTACAGGAATGAGCCACCACACCCAGCCGTGCCCAGCTAATTTTTGCATTTTTTAGTAGAGATGGGGTTTTGCCACGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAGGGGATCTGCCTGCCTCGGCCTCCTAGAGTGCTGGAATTACAGGTGTGAGCCACTGTGCCCGAACCTTTTATCATTATTATTTCTTGAGACAGGAGTCTTGCTCTGTCGTTCAGGCTGGAGTGCAGTGATGCGATCTTGGCTCACTGTAACTCCTACCTTTCGGTTCAAGTGATTCTCCTGCCTCAGCCTCTGGAGTAGCTGGGATTACAGGCACTGGGATTACAGGCACACACCACCACACCATGCTAGTTTTTTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATTTGCCTGCCTTGGCTTCCCAAAGTGCTGGGATTATAGGCACGAGCCACCACACACGACCAACATTGGCCTATCTTTTAAAAAATAAACCAAGCTCTGGCCGGGCACAGTGGCTCACACCTGTGATCCCAGCACTTTGGGAGGTTGAGGTGGTTGGATCACTTGAGTTCAGGAGTTTGAGACCAGCCTGACCAACGTGGTAAAACCCCATCTCTACTAAAAATAAAAACTAGTCGGGTGTGGTAGCACGCGTGCCTGTAATACCAGCTACTCAGGAGGCCAAGGCAGGAGAATTGCTTGAACCCAGGAGACAGAGTTTGCAGTGAGCCAAGATTGTGCCACTGCACTCCAGCCTGGGGGATAGAGGGAGACACCATCTCAAAAAAACCAAAATACAGAAATCAAAAAACCACACTCATTATTACCTCAAGACCTTTATGTTTGCTATTCCTCTGCCTATAAGATGCATTCCCTTCATTTTTCAAGGACAATTATTTCTTGTTATTTAGGTCTCAGCTCAATTTTTTCAGAAAGGCTTTCCCTGGCCTCCTTAAACGAAAGTAATCAACAACCTTTGACAGCTAATACTATTCCACTGTTCTGTATATTTCTCCATAGCATTTATTGTTATCTTAAATTCATCTTTATTGTGTATCTCCCCTCGACAGAACCTGAATCCTACCAGGGACTTAGTTAGTCTTATTTACTGTTGCATTCCTAGTGCCCAGAACACAGTAGGCTCCCAATAAATAGCCACTGAATAAAAGTTAAAACCAACAAAAATAATCATTTAATTAATTATGAATACATCGAATTGTGCACAATAGTTTATAAAATTACTTTTTTTTTTTTTTTAAGACAGGGTCTCATTCTGTCTCACAGGCTGGAGTGCAGTGGTGCAATCTAGGCTCACTGCAACCTCCGCCTCCCGGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCCAGCAGCTAGGATTACAGGCACATGCCACCACGCTCGACTAATTTTTTTGTGTTTTTAGTAGAGACAAGGTTTCACCATGTTGACCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATCCACCTGCCTTGGCCACTCAAAGTGCTGGGATTATAGGCATGAGCCACCACGCCTGGCCTATAAAATTACTTTCACATTTCATTTTGCCTGATCTGTTGTCACAGAAGTTCTCAGATGGCTGTTCTGAAATTATTCCTCCTCCTACACTCTATCTTATTTACTTCTCACTGTTCTCAGTATCATAAAGTGCAACATCTTTTTGAAGCAATCTGAATTATAAACAGATACATTTGCATGTATATATATGTATATATGCATATGCACACACACACTTTTTTTTTTTTAAGAGACAGGGTCTTGCTCTGTGCAAGTGCAAGAGTGCAATGGTATGATCATAGCTCACTGCAGCCTTGAACTCCTGGGCTCAAGTGATTCTTCTGGCTTAGCTTCCTCAGTAGCTAAGACTACAGAAGCACACTGCCATGCCCGGCTAATTAAAAAAAAATTTTGTGGAGACAGAGTCTCACTATGTTGCCCAGGCTGGTTTCAAACTCCTGGCCTCAAGTAATCTTCCTGTCTCAGCCTCCCAAAGGGCTGAGATTATAAGTGTGAGCCACTGCATCTGGACTGCATATTAATATGAAGAGCTTTTCTTCAACAACAGTGAACAGTTTTCTACAAAGGTATATGCAAGTGGGCCCACTTCTTGTTCTTATGAATCTTTTCTTTCCTTTTATAAAACTCCTTTTCCTTTCTCTTTTCCCCAAAGAAAGGACTGTTTCTTTTGAAATCTAGAACAAATGAGAACAGAGGATATCCTGGTTTGCGCTGCAAAATTTTTTTTTTTTTTAAGACGGAGTCTCGCTCTGTTGCCAGGTTGGAGTGCAGTGGCACGATCTTGGCTCATTGCAACCTCCACCTCCCGGGTTCAAGAGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGAACTAAAGGCGCATGCCACCACGCTGAGTAATTTTTTGTATTTTAGTAGAGACAGGGTTTCACCATGTTGCCCAGGCTGATCTCGAACTCCTGAGCTCAGGCAATCTGCCTGTCTTGGCCTCCCACAGTGTTAGGATTACAGGCATGAGCCACTGCACCCGATTTTTTTTTTCTTTTGATGGAGTTTTGCTCTTGTTGCCCAGGTTAGAGTGCAATGATGCGATCTCAGCTCACTGCAACCCCCGCCTCCCAGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGAATTACAGGCAAGTGCCACCAAGCCCGGCTAATTTTGTATTTTTAGTAGAAACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTTGAACTCCCGACATCAGGTGATCCAAGCGCCTCAGCCTCCCAAAGCGCTGGGATTATAGGTATGAGCCACAGTGCAGGCCTGCATAATTCTTGATGATCCTCATTATCATGGAAAATTTGTGCATTGTTAAGGAAAGTGGTGCATTGATGGAAGGAAGCAAATACATTTTTAACTATATGACTGAATGAATATCTCTGGTTAGTTTGTAACATCAAGTACTTACCTCATTCAGCATTTTTCTTTCTTTAATAGACTGGGTCACCCCTAAAGAGATCATAGAAAAGACAGGTTACATACAGCAGAAGAACGTGCTCTTTTCACGGAGATAGAGAGGTCAGCGATTCACAAAAGAGCACAGGAAGAATGACAGAGGAGAGGTCCTTCCCTCTAAAGCCACAGCCCTTTAATAAGGCTTGTAGCAGCAGTTTCCTTCTGGAGACAGAGTTGATGTTTAATTTAAACATTATAAGTTTGCCTGCTGCACATGGATTCCTGCCGACTATTAAATAAATCCCTAGCTCATATGCTAACATTGCTAGGAGCAGATTAGGTCCTATTAGTTATAAAAGAGACCCATTTTCCCAGCATCACCAGCTTATCTGAACAAAGTGATATTAAAGATAAAAGTAGTTTAGTATTACAATTAAAGACCTTTTGGTAACTCAGACTCAGCATCAGCAAAAACCTTAGGTGTTAAACGTTAGGTGTAAAAATGCAATTCTGAGGTGTTAAAGGGAGGAGGGGAGAAATAGTATTATACTTACAGAAATAGCTAACTACCCATTTTCCTCCCGCAATTCCTAGAAAATATTTCAGTGTCCGTTCACACACAAACTCAGCATCTGCAGAATGAAAAACACTCAAAGGATTAGAAGTTGAAAACAAAATCAGGAAGTGCTGTCCTAAGAAGCTAAAGAGCCTCAGTTTTTTACACTCCCAAGATCAATCTGGATTTATGATTCTAAAACCCCTGGTGACAGAATCAGAGGCTGAAAACACCACTAATTATAACCAGCAGGTATGGATATTTGGAAGTCTAGGGGAGGCTGATATGAAGTTAAGACCAGAGGAAATATCTGTCCACTCCCTCTTCTCAACACCCATCTTCTAGACGCCAAGGCTAGCTATAGATCTCCATTATAGTGTTCAAGGAATTAGGAATTATCCATGTCAATAGTTTTGATTAATGTGGACGGAGAACATCTATATTACTAGATGGCAATATGTGAAAGAAGAAAACAGTATTGTTGAAAACCTAAATCTGAAATGTCAATGTAATGACAAATTTTCACCCCTAGAATGTCTACCTGGGGAGTCCTAACCCTCTAATATTCCCCTGAGAGGGATGGGAGAATACAGTGCAGAGCTTTTATATAAGTATTTCAGAAAGCAGTAGCTAAAGAATCACTTGTTTATTTCCCAGTGTTTCAAAGGCCCTTCTGAAGAACTAAGCAAACTAAGGAAAGACCATTTAGTTTTAAACAGGAGAAATGTATTTAACTAAATCCTAAACACAGCAGGCTATCTGCAAGCAGCAGCAGCAGCAGCAGCCATGCTCCCTCACAGAATCCTTACAATTTTTGAAGTTTTTTGTTTAACTGCTACAAAAGCCGATTTAGTAACATTTATTACACTTAAAAACTTCAGTTCATTTGTAGTTCAAAGCAAATGTATTGGCTTTGAGTTTAAAGACTGAACTACTTTAGATTTGATTTGCATTTTTTTTTTTTTTTTTTTTTGAGATGCAGTCTTGCTCTGTCAGCCAGGCTGGAGTGCAGTGGCTGGATCTCAGCTCACGGCAAGCTCTGCCTCCTGGGTTCATGCCATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTACAGATGCCCGCCACCATGCCCGGCTAATTTTTTGTATTTTTACTAGAGATGGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCTGCCCGCCTTGGCCCCCCAAAGCGCTGGGATTACAGGCCTGAGCCACCACGCTTGGCATCTTTTTACCTTTCATTAACTTTGATGCAAACCTATAGCTTAAGGTATCTTAAACTTTAATGACATTTTTCTCTAAAATAGTAGTTTGTAATAACTTGTTCTGGCACCTGGCTCCAATGAACACTACCCTCTGACCCTGTGGTATAATTTTCATGAGTAAGTGGAAACCTAAGATCTTAGAAGTTCAACGGCAATGTGTCCAAGGGGTTTAGATCCTCTCCTTAAGTGCCTGTATCTCTGTGAAAAGAATCATCATAGGCTAGGCGCGATGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTAGGTGGATCACCTGAGGTCGGGAGTCCAAGACCAGCCTGACTGACATGGAAAAACCCTGTCTCTACTAAAAATACAAAATTAGGTATGGTGGTGCATTCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGGGGAGGTTGCAGCAAGCCAAGATCGTGCCATTGCACTCCAGCAGCCTGGGCAACAAGAGTGAAAAACTACACCTCAAAAACAAAAACAAAAACAAAAGAATCATCATCAAGTGAACTGGAACACATCCAGAGAACTAATTTTGTTAGAAAGATTTTAGAGTTGAGCCACACAATCTGCATCTTCTGCGTCCTCCATGCACTCGTCTGCTTTCTGGAGCCCCATGAGTGAGTCTTAATCCTGTTCCAGATAACAGTTCTCTTCCGGGTAACGGTTCTTCAGATACTTGAAGACAGTGTCTTATTTCCTTAAATCTTCTCATTTCTTCTTCAAAAGACAGTATTTCAAGTTACTTTTATGTATCTTTACCATCTACCTCTGGATAAACACTCTCCAATTTGTCAGTGACCATGTTAAAAACCAAGCACGGTGCTTAAAACTGACATCATCTTTCAGGCAATCACTCCATTGGAGAATACAGTGGGGCTCTGGATCTGTACTTCACTTGCTCCAGAGCCTCTGCTTGTGTTAATACGGCCCAGTTTCAAATAAGCATTTTTAGCAGCCCTGAAATGTGTACTCAGATTTAGTTTATAGTCAACTAAAAACACCCAGAGGTCTCCTGTATTACACAAGTTATAATTAAAACCTTAAAAGAGAAAGGTATAGGACAAATGATCTGTCTCCTCCCTTTTTTGCTTTTTCATATGTTAAGACTATCTCGGAGCTGTTATCAGACTT