pairwise alignment

22
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial explosion than with pairwise alignment…..

Upload: leilani-davidson

Post on 31-Dec-2015

62 views

Category:

Documents


0 download

DESCRIPTION

Pairwise alignment. Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial explosion than with pairwise alignment…. Multi-dimensional dynamic programming (Murata et al. 1985). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pairwise alignment

Pairwise alignment

Now we know how to do it: How do we get a multiple

alignment (three or more sequences)?

Multiple alignment: much greater combinatorial explosion than with pairwise alignment…..

Page 2: Pairwise alignment

Multi-dimensional dynamic programming(Murata et al. 1985)

Page 3: Pairwise alignment

Simultaneous Multiple alignmentMulti-dimensional dynamic programming

MSA (Lipman et al., 1989, PNAS 86, 4412)

extremely slow and memory intensive up to 8-9 sequences of ~250 residues

DCA (Stoye et al., 1997, CABIOS 13, 625)

still very slow

Page 4: Pairwise alignment

Alternative multiple alignment methods

Biopat (first method ever) MULTAL (Taylor 1987) DIALIGN (Morgenstern 1996) PRRP (Gotoh 1996) Clustal (Thompson Higgins Gibson

1994) Praline (Heringa 1999) T Coffee (Notredame 2000) HMMER (Eddy 1998) [Hidden Marcov

Models] SAGA (Notredame 1996) [Genetic

algorithms]

Page 5: Pairwise alignment

Progressive multiple alignment general principles

1213

45

Guide tree Multiple alignment

Score 1-2

Score 1-3

Score 4-5

Scores Similaritymatrix5×5

Scores to distances Iteration possibilities

Page 6: Pairwise alignment

General progressive multiple alignment technique(follow generated tree)

13

25

13

13

13

25

254

d

root

Page 7: Pairwise alignment

Progressive multiple alignment

Problem: Accuracy is very important Errors are propagated into the

progressive steps

“Once a gap, always a gap”

Feng & Doolittle, 1987

Page 8: Pairwise alignment

Multiple alignment profilesGribskov et al. 1987

ACDWY

Gappenalties

i0.30.100.30.3

0.51.0

Position dependent gap penalties

Page 9: Pairwise alignment

ACD……VWY

sequence

profile

Profile-sequence alignment

Page 10: Pairwise alignment

ACD..Y

ACD……VWY

profile

profileProfile-profile alignment

Page 11: Pairwise alignment

Clustal, ClustalW, ClustalX CLUSTAL W/X (Thompson et al., 1994) uses Neighbour

Joining (NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct guide tree.

Sequence blocks are represented by profiles, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree.

Further carefully crafted heuristics include: (i) local gap penalties (ii) automatic selection of the amino acid substitution matrix,

(iii) automatic gap penalty adjustment (iv) mechanism to delay alignment of sequences that appear to

be distant at the time they are considered. CLUSTAL (W/X) does not allow iteration (Hogeweg and

Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)

Page 12: Pairwise alignment

Profile pre-processing Secondary structure-induced

alignment Globalised local alignment Matrix extension

Objective: try to avoid (early) errors

Strategies for multiple sequence alignment

Page 13: Pairwise alignment

Pre-profile generation1213

45

Score 1-2

Score 1-3

Score 4-5

ACD..Y

12345

1ACD..Y

21345

2

Pre-profilesPre-alignments

512354

ACD..Y

Cut-off

Page 14: Pairwise alignment

Profile pre-processing Secondary structure-induced

alignment Globalised local alignment Matrix extension

Objective: try to avoid (early) errors

Strategies for multiple sequence alignment

Page 15: Pairwise alignment

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE (oligomers)

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Protein structure hierarchical levels

Page 16: Pairwise alignment

Profile pre-processing Secondary structure-induced

alignment Globalised local alignment Matrix extension

Objective: try to avoid (early) errors

Strategies for multiple sequence alignment

Page 17: Pairwise alignment

Globalised local alignment

+ =

1. Local (SW) alignment (M + Po,e)

2. Global (NW) alignment (no M or Po,e)

Double dynamic programming

Page 18: Pairwise alignment

Profile pre-processing Secondary structure-induced

alignment Globalised local alignment Matrix extension

Objective: try to avoid (early) errors

Strategies for multiple sequence alignment

Page 19: Pairwise alignment

Matrix extension – T COFFEE

12

13

14

23

24

34

Page 20: Pairwise alignment

Summary

Weighting schemes simulating simultaneous multiple alignment Profile pre-processing (global/local) Matrix extension (well balanced scheme)

Smoothing alignment signals globalised local alignment

Using additional information secondary structure driven alignment

Schemes strike balance between speed and sensitivity

Page 21: Pairwise alignment

References Heringa, J. (1999) Two strategies for sequence

comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364.

Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217.

Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477.

Page 22: Pairwise alignment

Where to find this….http://www.cs.vu.nl/~ibivu/teaching