1-month practical course genome analysis 2008 lecture 3: profiles: representing sequence alignment

1-month Practical CourseGenome Analysis 2008

Lecture 3: Profiles: representing sequence alignment

Centre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit AmsterdamThe Netherlandsibivu.nl heringa@cs.vu.nl

FORINTEGRATIVE

BIOINFORMATICSVU

Alignment input parametersScoring alignments

Amino Acid Exchange Matrix

Gap penalties (open, extension)

A number of different schemes have been developed to compile residue exchange matrices

However, there are no formal concepts to calculate corresponding gap penalties

Emperically determined values are recommended for PAM250, BLOSUM62, etc.

But how can we align blocks of sequences ?

The dynamic programming algorithm performs well for pairwise alignment (two axes).

So we should try to treat the blocks as a “single” sequence …

How to represent a block of sequences

Historically: consensus sequence single sequence that best represents the amino acids observed at each alignment position.

Modern methods: alignment profile representation that retains the information about frequencies of amino acids observed at each alignment position.

Consensus sequence

Problem: loss of information

For larger blocks of sequences it “punishes” more distant members

Sequence 1

F A T N M G T S D P P T H T R L R K L V S Q

Sequence 2

F V T N M N N S D G P T H T K L R K L V S T

Consensus F * T N M * * S D * P T H T * L R K L V S *

Alignment profiles

Advantage: full representation of the sequence alignment (more information retained)

Not only used in alignment methods, but also in sequence-database searching (to detect distant homologues)

Also called PSSM in BLAST (Position-specific scoring matrix)

Multiple alignment profilesMultiple alignment profiles

fA..fC..fD..fW..fY..Gapo, gapxGapo, gapx

Position-dependent gap penalties

Core region Core regionGapped region

Gapo, gapx

fA..fC..fD..fW..fY..

frequencies

Profile buildingProfile building Example: each aa is represented as a frequency and gap penalties as weights.

Gappenalties

i0.30.100.30.3

0.51.0Position dependent gap penalties

0.50000.5

00.50.20.10.2

Profile-sequence alignmentProfile-sequence alignment

ACD……VWY

sequence

Sequence to profile alignmentSequence to profile alignment

Score of amino acid L in a sequence that is aligned against this profile position:

Score = 0.4 * s(L, A) + 0.2 * s(L, L) + 0.4 * s(L, V)

Profile-profile alignmentProfile-profile alignment

ACD..Y

ACD……VWY

profile

General function for profile-profile General function for profile-profile scoringscoring

At each position (column) we have different residue frequencies for each amino acid (rows)

Instead of saying S=s(aa1, aa2) for pairwise alignment For comparing two profile positions we take:

ACD..Y

Profile 1ACD..Y

Profile 2

jjiji )aa,s(aafaafaaS

Profile to profile alignmentProfile to profile alignment

Match score of these two alignment columns using the a.a frequencies at the corresponding profile positions:

Score = 0.4*0.75*s(A,G) + 0.2*0.75*s(L,G) + 0.4*0.75*s(V,G) +

+ 0.4*0.25*s(A,S) + 0.2*0.25*s(L,S) + 0.4*0.25*s(V,S)

s(x,y) is value in amino acid exchange matrix (e.g. PAM250, Blosum62) for amino acid pair (x,y)

0.75 G

0.25 S

1-month practical course genome analysis 2008 lecture 3: profiles: representing sequence alignment

Documents

genome 373: genome informatics - github pages · genome...

1 multiple genome alignment: chaining algorithms revisited...

use of alignment-free phylogenetics for rapid genome ... ·...

bfast: an alignment tool for large scale genome …

next generation sequencing and its data analysis challenges...

sequencing a genome and basic sequence alignment lecture 10...

sequence comparison and genome alignment in the human...

28-way vertebrate alignment and conservation track...

representing 3d models for alignment and recognition

e n genome analysis (integrative - vusemi-global dynamic...

multiple whole genome alignment - uw–madison ·...

algorithms for alignment of genome sequences

a direct comparison of genome alignment and transcriptome...

sequence assembly and alignment - gerstein labde novo...

alignment of long sequences pairwise whole genome ... ›...

fast algorithms for large scale genome alignment and...

1-month practical course genome analysis lecture 5: multiple...

approximate querying of rdf graphs via path alignment ›...

whole genome alignment - center for bioinformatics and

a compressing method for genome sequence cluster using...