vorlesung grundlagen der bioinformatik

96
Vorlesung Grundlagen der Bioinformatik http://gobics.de/lectures/ss07/grundlagen

Upload: maya-bishop

Post on 28-Mar-2015

230 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Vorlesung Grundlagen der Bioinformatik

Vorlesung

Grundlagen der Bioinformatik

http://gobics.de/lectures/ss07/grundlagen

Page 2: Vorlesung Grundlagen der Bioinformatik

Information from a SingleSequenceAlone

Sequence alignment in molecular data analysis:

Page 3: Vorlesung Grundlagen der Bioinformatik

Information from a SingleSequenceAlone

Multi-OrganismHigh QualitySequences

Sequence alignment in molecular data analysis:

(M. Brudno)

Page 4: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I M R E A Q Y E

seq2 T C I V M R E A Y E

seq3 Y I M Q E V Q Q E

seq4 Y I A M R E Q Y E

Page 5: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 6: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 7: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 8: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 9: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Functionally important regions more conserved than non-functional regions

Page 10: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Functionally important regions more conserved than non-functional regions

Local sequence conservation indicates functionality!

Page 11: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 12: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V - M R E A Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 13: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Which one is the best ???

Page 14: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

Questions in development of alignment programs:

(1) What is a good alignment?

→ objective function (`score’)

(2) How to find a good alignment?

→ optimization algorithm

First question far more important !

Page 15: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

Most important scoring scheme for multiple alignment:

Sum-of-pairs score for global alignment.

Page 16: Vorlesung Grundlagen der Bioinformatik

Divide-and-Conquer Alignment (DCA)

J. Stoye, A. Dress (Bielefeld)

Approximate optimal global multiple alignment

Divide sequences into small sub-sequences Use MSA to calculate optimal alignment for sub-

sequences Concatenate sub-alignments

Page 17: Vorlesung Grundlagen der Bioinformatik

Divide-and-Conquer Alignment (DCA)

Page 18: Vorlesung Grundlagen der Bioinformatik

Divide-and-Conquer Alignment (DCA)

Page 19: Vorlesung Grundlagen der Bioinformatik

Tools for multiple sequence alignment

Problems with traditional approach:

Results depend on gap penalty

Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction

Algorithm produces global alignments.

Page 20: Vorlesung Grundlagen der Bioinformatik

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atctaatagttaatactcgtccaagtat atctgtattactaaacaactggtgctacta

Page 21: Vorlesung Grundlagen der Bioinformatik

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

Page 22: Vorlesung Grundlagen der Bioinformatik

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

Page 23: Vorlesung Grundlagen der Bioinformatik

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

Page 24: Vorlesung Grundlagen der Bioinformatik

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atc--taatagttaatactcgtccaagtat || || | || gcgtgtattact-aaacggttcaatctaacat

Page 25: Vorlesung Grundlagen der Bioinformatik

New question: sequence families with multiple local similarities

Neither local nor global methods appliccable

Page 26: Vorlesung Grundlagen der Bioinformatik

New question: sequence families with multiple local similarities

Alignment possible if order conserved

Page 27: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103

Combination of global and local methods

Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

Page 28: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 29: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 30: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 31: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 32: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 33: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 34: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 35: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 36: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 37: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 38: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency!

Page 39: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 40: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

l(f) = length of fs(f) = sum of matches (similarity values)

P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences.

Score w(f) = -ln P(f)

Page 41: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

Define score of alignment as

sum of scores of involved fragments

No gap penalty!

Page 42: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Score of an alignment:

Goal in fragment-based alignment approach: find

Consistent collection of fragments with maximum sum of weight scores

Page 43: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

Page 44: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 45: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 46: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming:

Standard fragment-chaining algorithm

Space-efficient algorithm

Page 47: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 48: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaccctgaattgaagagtatcacataa

(1) Calculate all optimal pair-wise alignments

Page 49: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 50: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 51: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

Page 52: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 53: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 54: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 55: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 56: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 57: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 58: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

1. Sort fragments according to scores

2. Include them one-by-one into growing multiple alignment – as long as they are consistent

(greedy algorithm, comparable to rucksack problem)

Page 59: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 60: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 61: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 62: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 63: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 64: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 65: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 66: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 67: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagt taaactcccccgtgcttag

Cagtgcgtgtattact aacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 68: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taata-----gttaaactcccccgtgcttag

Cagtgcgtgtatta-----ctaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 69: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

site x = [i,p] (sequence i, position p)

Page 70: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Calculate upper bound bl(x,i) and lower bound bu(x,i) for each x and sequence i

Page 71: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

bl(x,i) and bu(x,i) updated for each new fragment in alignment

Page 72: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment

Efficient algorithm

(Abdeddaim and Morgenstern, 2002)

Page 73: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

Advantages of segment-based approach:

Program can produce global and local alignments!

Sequence families alignable that cannot be aligned with standard methods

Page 74: Vorlesung Grundlagen der Bioinformatik

Program input

Program usage:

> dialign2-2 [options] <input_file>

<input_file> = multi-sequence file in FASTA-format

Page 75: Vorlesung Grundlagen der Bioinformatik

Program output

DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: [email protected] Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218.

For more information, please visit the DIALIGN home page at

http://bibiserv.techfak.uni-bielefeld.de/dialign/

program call: ./dialign2-2 -nt -anc s

Aligned sequences: length: ================== ======= 1) dog_il4 300 2) bla 200 3) blu 200

Average seq. length: 233.3

Please note that only upper-case letters are considered to be aligned.

Page 76: Vorlesung Grundlagen der Bioinformatik

Program output

Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

Page 77: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 78: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 79: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 80: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 81: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 82: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 83: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 84: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 85: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 86: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaac----------ggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 87: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 88: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag------

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa--

Page 89: Vorlesung Grundlagen der Bioinformatik

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 90: Vorlesung Grundlagen der Bioinformatik

Alignment of large genomic sequences

Fragment-based alignment approach useful for alignment of genomic sequences.

Possible applications: Detection of regulatory elements Identification of pathogenic microorganisms Gene prediction

Page 91: Vorlesung Grundlagen der Bioinformatik

DIALIGN alignment of human and murine genomic sequences

Page 92: Vorlesung Grundlagen der Bioinformatik

DIALIGN alignment of tomato and Thaliana genomic sequences

Page 93: Vorlesung Grundlagen der Bioinformatik

Alignment of large genomic sequences

Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

Page 94: Vorlesung Grundlagen der Bioinformatik

Alignment of large genomic sequences

Page 95: Vorlesung Grundlagen der Bioinformatik

Performance of long-range alignment programs for exon discovery (human - mouse comparison)

Page 96: Vorlesung Grundlagen der Bioinformatik

Performance of long-range alignment programs for exon discovery (thaliana - tomato comparison)