inferring function by homology the fact that functionally important aspects of sequences are...

20
Inferring function by homology Inferring function by homology The fact that functionally important aspects of The fact that functionally important aspects of sequences are conserved across evolutionary sequences are conserved across evolutionary time allows us to find, by homology searching, time allows us to find, by homology searching, the equivalent genes in one species to those the equivalent genes in one species to those known to be important in other model species. known to be important in other model species. Logic: if the linear alignment of a pair of Logic: if the linear alignment of a pair of sequences is similar, then we can infer that sequences is similar, then we can infer that the 3-dimensional structure is similar; if the the 3-dimensional structure is similar; if the 3-D structure is similar then there is a good 3-D structure is similar then there is a good chance that the function is similar. chance that the function is similar.

Upload: silvester-brown

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Inferring function by Inferring function by homologyhomology

The fact that functionally important aspects of The fact that functionally important aspects of sequences are conserved across evolutionary sequences are conserved across evolutionary time allows us to find, by homology searching, time allows us to find, by homology searching, the equivalent genes in one species to those the equivalent genes in one species to those known to be important in other model species. known to be important in other model species.

Logic: if the linear alignment of a pair of Logic: if the linear alignment of a pair of sequences is similar, then we can infer that the 3-sequences is similar, then we can infer that the 3-dimensional structure is similar; if the 3-D dimensional structure is similar; if the 3-D structure is similar then there is a good chance structure is similar then there is a good chance that the function is similar.that the function is similar.

Page 2: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

BASIC LOCAL ALIGNMENT SEARCH TOOLSBASIC LOCAL ALIGNMENT SEARCH TOOLS ((BLAST)BLAST)

BLAST programs (there are several) compare a query sequence BLAST programs (there are several) compare a query sequence to all the sequences in a database in a pairwise manner.to all the sequences in a database in a pairwise manner.

Breaks: query and database sequences into fragments known Breaks: query and database sequences into fragments known as "words", and seeks matches between them.as "words", and seeks matches between them.

Attempts to align query words of length "W" to words in the Attempts to align query words of length "W" to words in the database such that the alignment scores at least a threshold database such that the alignment scores at least a threshold value, "T".value, "T". known as High-Scoring Segment Pairs (HSPs)known as High-Scoring Segment Pairs (HSPs)

HSPs are then extended in either direction in an attempt to HSPs are then extended in either direction in an attempt to generate an alignment with a score exceeding another generate an alignment with a score exceeding another threshold, "S", known as a Maximal-Scoring Segment Pair threshold, "S", known as a Maximal-Scoring Segment Pair (MSP)(MSP)

Page 3: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

2 sequence alignment2 sequence alignment

To align GARFIELDTHECAT withTo align GARFIELDTHECAT with GARFIELDTHERAT is easy GARFIELDTHERAT is easy

GARFIELDTHECATGARFIELDTHECAT

||||||||||| ||||||||||||| ||

GARFIELDTHERAT GARFIELDTHERAT

Page 4: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

GapsGaps

Sometimes, you can get a better overall Sometimes, you can get a better overall alignment if you insert gapsalignment if you insert gaps

GARFIELDTHECAT GARFIELDTHECAT |||||||| ||| |||||||| ||| GARFIELDA--CATGARFIELDA--CAT is better (scores higher) than is better (scores higher) than GARFIELDTHECAT GARFIELDTHECAT |||||||| |||||||| GARFIELDACAT GARFIELDACAT

Page 5: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

No gap penaltyNo gap penalty

But there has to be some sort of a But there has to be some sort of a gap-penalty otherwise you can gap-penalty otherwise you can align ANY two sequences: align ANY two sequences:

G-R--E------AT G-R--E------AT

| | | || | | | ||

GARFIELDTHECATGARFIELDTHECAT

Page 6: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Affine gap penaltyAffine gap penalty

Could set a score for each indelCould set a score for each indel Usually use affine (open + extend)Usually use affine (open + extend) Open –10, extend -0.05Open –10, extend -0.05

Page 7: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

2+ similar sequences2+ similar sequences

When doing a similarity search against a When doing a similarity search against a

databasedatabase

you are trying to decide which of many you are trying to decide which of many

sequences is the sequences is the CLOSESTCLOSEST match to your search match to your search

sequence. sequence.

Which of the following alignment pairs is Which of the following alignment pairs is

better?: better?:

Page 8: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Scoring AlignmentsScoring Alignments

GARFIELDTHECAT GARFIELDTHECAT |||| ||||||||||| |||||||GARFRIEDTHECAT GARFRIEDTHECAT

GARFIELDTHECATGARFIELDTHECAT||| ||| ||||| ||| ||| ||||| GARWIELESHECAT GARWIELESHECAT

GARFIELDTHECAT GARFIELDTHECAT || ||||||| || || ||||||| || GAVGIELDTHEMATGAVGIELDTHEMAT

Page 9: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Willie Taylor’s AA Venn DiagramWillie Taylor’s AA Venn Diagram

Page 10: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Substitution matricesSubstitution matrices#BLOSUM 90#BLOSUM 90 A R N D C Q E G H I LA R N D C Q E G H I LA 5 -2 -2 -3 -1 -1 -1 0 -2 -2 -2A 5 -2 -2 -3 -1 -1 -1 0 -2 -2 -2R -2 6 -1 -3 -5 1 -1 -3 0 -4 -3R -2 6 -1 -3 -5 1 -1 -3 0 -4 -3N -2 -1 7 1 -4 0 -1 -1 0 -4 -4N -2 -1 7 1 -4 0 -1 -1 0 -4 -4D -3 -3 1 7 -5 -1 D -3 -3 1 7 -5 -1 11 -2 -2 -5 -5 -2 -2 -5 -5C -1 -5 -4 -5 C -1 -5 -4 -5 99 -4 -6 -4 -5 -2 -2 -4 -6 -4 -5 -2 -2Q -1 1 0 -1 -4 7 2 -3 1 -4 -3Q -1 1 0 -1 -4 7 2 -3 1 -4 -3E -1 -1 -1 1 -6 2 6 -3 -1 -4 -4E -1 -1 -1 1 -6 2 6 -3 -1 -4 -4G 0 -3 -1 -2 -4 -3 -3 6 -3 -5 -5G 0 -3 -1 -2 -4 -3 -3 6 -3 -5 -5H -2 0 0 -2 -5 1 -1 -3 8 -4 -4H -2 0 0 -2 -5 1 -1 -3 8 -4 -4I -2 -4 -4 -5 -2 -4 -4 -5 -4 5 1I -2 -4 -4 -5 -2 -4 -4 -5 -4 5 1L -2 -3 -4 -5 -2 -3 -4 -5 -4 1 5L -2 -3 -4 -5 -2 -3 -4 -5 -4 1 5

Page 11: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Low Complexity MaskingLow Complexity Masking

Some sequences are similar even if they have no Some sequences are similar even if they have no recentrecentcommon ancestor. common ancestor.

Huntington's disease is caused by poly CAG tracks in Huntington's disease is caused by poly CAG tracks in the DNA which results in polyGlutamine (Gln, Q) the DNA which results in polyGlutamine (Gln, Q) tracks in the protein. tracks in the protein.

If you do a homology search with QQQQQQQQQQ you If you do a homology search with QQQQQQQQQQ you get hits to other proteins that have a lot of get hits to other proteins that have a lot of glutamines but have totally different function.glutamines but have totally different function.

Page 12: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

2 sequence alignment2 sequence alignment

Huntingtin: Huntingtin: MATLEKLMKA FESLKSFQQQ QQQQQQQQQQMATLEKLMKA FESLKSFQQQ QQQQQQQQQQQQQQQQQQQQ PPPPPPPPPP PQLPQPPPQAQQQQQQQQQQ PPPPPPPPPP PQLPQPPPQA hitshits

>MM16_MOUSE MATRIX METALLOPROTEINASE-16 Score = >MM16_MOUSE MATRIX METALLOPROTEINASE-16 Score = 34.4 bits (78), Expect = 0.18 Identities = 21/65 (32%), 34.4 bits (78), Expect = 0.18 Identities = 21/65 (32%), Positives = 25/65 (38%), Gaps = 2/65 (3%):Positives = 25/65 (38%), Gaps = 2/65 (3%):

FQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQ--AQPLLPQPQPPPPPPFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQ--AQPLLPQPQPPPPPPF Q + + Q Q+ PP PPP LP PP P P+ P PPF Q + + Q Q+ PP PPP LP PP P P+ P PPFYQYMETDNFKLPNDDLQGIQKIYGPPDKIPPPTRPLPTVPPHRSVPPADPRRHDRPKPPFYQYMETDNFKLPNDDLQGIQKIYGPPDKIPPPTRPLPTVPPHRSVPPADPRRHDRPKPP

But not because it is involved in But not because it is involved in microtubule mediated transport! microtubule mediated transport!

Page 13: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

E valuesE values

An E-value is a measure of the probability of any An E-value is a measure of the probability of any given hit occurring by chance.given hit occurring by chance.

Dependent on the size of the query sequence and Dependent on the size of the query sequence and the database. the database.

The lower the E-value the more confidence you can The lower the E-value the more confidence you can have that a hit is a true homologue (sequence have that a hit is a true homologue (sequence related by common descent).related by common descent).

Page 14: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

Dotplot Dotplot theorytheory

A T G A T A T T C T T A . . . . . . . . . . . T . . . . . . . . . . . T . . . . . . . . . . . G . . . . . . . . . . . T . . . . . . . . . . . T . . . . . . . . . . . C . . . . . . . . . . .

Task: align ATGATATTCTT and ATTGTTC

Another way of comparing 2 sequences

Page 15: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

A T G A T A T T C T T A . . . . . . . . . . . T . + . . + . + . . + . T . . . . . . . . . . . G . . . . . . . . . . . T . . . . . . . . . . . T . . . . . . . . . . . C . . . . . . . . . . .

Go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to ATT (the first 3 bases in the vertical sequence)

Page 16: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

A T G A T A T T C T T A . . . . . . . . . . . T . + . . + . + . . + . T . + . . . . . + . . . G . . . . . . . . . . . T . . . . . . . . . . . T . . . . . . . . . . . C . . . . . . . . . . .

Then go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to TTG (the next 3 in the vertical sequence).

Page 17: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

A T G A T A T T C T T A . . . . . . . . . . . T . + . . + . + . . + . T . + . . . . . + . . . G . . + . . . . . + . . T . . . + . . . . . + . T . . . . . . . + . . . C . . . . . . . . . . .

Iterate until

Page 18: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology

A T G A T A T T C T T A T + + + + T + + G + + T + + T + C

The human eye is particularly good at picking up structure from the pattern of dots. You might see a hint of a duplicated region in the horizontal sequence that is not so clear from the sequence itself

Page 19: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology
Page 20: Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology