Transcript
Page 1: P rotein domain/family db
Page 2: P rotein domain/family db

Protein domain/family db

• Secondary databases are the fruit of analyses of the sequences found in the primary sequence db

• Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO)

• Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)

Page 3: P rotein domain/family db

Protein domain/family

• Most proteins have « modular » structures• Estimation: ~ 3 domains / protein• Domains (conserved sequences or structures) are identified by

multiple sequence alignments

• Domains can be defined by different methods: – Pattern (regular expression); used for very conserved domains– Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and

insertion-scores, derived from aligned sequence families; used for less conserved domains– Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.

Page 4: P rotein domain/family db

Some statistics• 15 most common domains for H. sapiens (Incomplete)

Immunoglobulin and major histocompatibility complex domain

Zinc finger, C2H2 typeEukaryotic protein kinaseRhodopsin-like GPCR superfamilyPleckstrin homology (PH) domainZinc finger, RING typeSrc homology 3 (SH3) domainRNA-binding region RNP-1 (RNA recognition motif)EF-hand familyHomeobox domainKrab boxPDZ domain (also known as DHR or GLGF)Fibronectin type III domainEGF-like domainCadherin domain…

http://www.ebi.ac.uk/proteome/HUMAN/interpro/top15d.html

Page 5: P rotein domain/family db

Protein domain/family db

PROSITE Patterns /ProfilesProDom Aligned motifsPRINTS Aligned motifsPfam HMM (Hidden Markov Models)

SMART HMMBLOCKS Aligned motifs

InterPro

Page 6: P rotein domain/family db

Prosite

Created in 1988 (SIB) Contains functional domains fully annotated, based on two methods:

patterns and profiles

Entries are deposited in PROSITE in two distinct files: Pattern/profiles with the list of all matches in SWISS-PROT Documentation

Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.

Page 7: P rotein domain/family db

Diagnostic performance

List of matches

Page 8: P rotein domain/family db

Prosite (profile): example

Page 9: P rotein domain/family db

PFAM (HMMs): an entry

Page 10: P rotein domain/family db

Page 11: P rotein domain/family db

PFAM (HMMs): query output

Page 12: P rotein domain/family db

HMMs

Page 13: P rotein domain/family db

Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it

Page 14: P rotein domain/family db

ProDom• consists of an automated compilation of

homologous domain alignment.

• August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000

Page 15: P rotein domain/family db

ProDom: query output example

Your query

Page 16: P rotein domain/family db

Protein domain/family: Composite databases

Example: InterPro

• Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites;

• Single set of documents linked to the various methods;• Will be used to improve the functional annotation of

SWISS-PROT (classification of unknown protein…)

• This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.

Page 17: P rotein domain/family db
Page 18: P rotein domain/family db
Page 19: P rotein domain/family db

Top Related