pattern databases in protein analysis

17
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP

Upload: ely

Post on 11-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Arthur Gruber. Pattern databases in protein analysis. Instituto de Ciências Biomédicas Universidade de São Paulo. AG-ICB-USP. Protein databases. Genpept – protein sequence database translated from GenBank - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pattern databases in protein analysis

Pattern databases in protein analysis

Arthur Gruber

Instituto de Ciências Biomédicas

Universidade de São Paulo

AG-ICB-USP

Page 2: Pattern databases in protein analysis

Protein databases• Genpept – protein sequence database

translated from GenBank• UniProtKB/TrEMBL – is a computer-annotated

protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase.

• UniProtKB/Swiss-Prot – is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases.

AG-ICB-USP

Page 3: Pattern databases in protein analysis

How to assign protein functions? • Similar proteins may share common functions,

but… proteins that share common domains may have evolved to perform distinct functions

• Proteins that exert similar function may share common domains, but… domain sequences are not always very similar – more refined are requires than simply similarity searches

• Proteins may share common domains, but have different architectures – no single domain are necessarily involved with protein function. Many proteins use multiple domains to perform their activities AG-ICB-USP

Page 4: Pattern databases in protein analysis

Some conclusions • Similarity searches may reveal proteins that

share very similar sequences and functions – high similarity over the full length of the query sequence

• An output with no significant hits or with hits to unannotated proteins will no unravel the possible function of the query protein

• Similarity searches do not differentiate orthologues from paralogues

• When matching multidomain proteins, it may not be appropriate to transfer the functional annotation – the context is important!

AG-ICB-USP

Page 5: Pattern databases in protein analysis

So what do proteins with similar function have in

common?

AG-ICB-USP

Page 6: Pattern databases in protein analysis

residues, motifs, domains, architecture…

AG-ICB-USP

Page 7: Pattern databases in protein analysis

Pattern databases• Databases that contain patterns of residue

conservation within groups of related sequences

• There are several methods to determine patterns

• There are many different pattern databases

AG-ICB-USP

Page 8: Pattern databases in protein analysis

Pattern databases

AG-ICB-USP

Page 9: Pattern databases in protein analysis

Common protein pattern databases

AG-ICB-USP

• Prosite patterns – regular expressions• Prosite profiles – weight matrices (profiles)• Pfam – database of protein domain families.

Contains curated multiple sequence alignments for each family and corresponding HMMs

• Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function

• Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches

• Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource

Page 10: Pattern databases in protein analysis

How to start building a pattern database?

AG-ICB-USP

• Prosite patterns – regular expressions• Prosite profiles – weight matrices (profiles)• Pfam – database of protein domain families.

Contains curated multiple sequence alignments for each family and corresponding HMMs

• Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function

• Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches

• Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource

Page 11: Pattern databases in protein analysis

How to start building a pattern database?

AG-ICB-USP

Page 12: Pattern databases in protein analysis

How to start building a pattern database?

AG-ICB-USP

With multiple sequence alignments of functionally related proteins

Page 13: Pattern databases in protein analysis

Some definitions

AG-ICB-USP

• Protein motif – a single conserved region• Prosite pattern – a consensus expression of a

conserved region• Frequency matrices (PRINTS) – matrices that contain

the frequencies in which residures occur in a given motif

• PSSM – position specific score (weight) matrices (BLOCKS) –add a scoring scheme to the frequency matrices

• HMMs profiles – probabilistic models derived from alignment profiles

• Protein domain - is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain.

Page 14: Pattern databases in protein analysis

AG-ICB-USP

Page 15: Pattern databases in protein analysis

AG-ICB-USP

Page 16: Pattern databases in protein analysis

AG-ICB-USP

Page 17: Pattern databases in protein analysis

AG-ICB-USP