p rotein domain/family db

19

Upload: tannar

Post on 01-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

P rotein domain/family db. Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: P rotein domain/family db
Page 2: P rotein domain/family db

Protein domain/family db

• Secondary databases are the fruit of analyses of the sequences found in the primary sequence db

• Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO)

• Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)

Page 3: P rotein domain/family db

Protein domain/family

• Most proteins have « modular » structures• Estimation: ~ 3 domains / protein• Domains (conserved sequences or structures) are identified by

multiple sequence alignments

• Domains can be defined by different methods: – Pattern (regular expression); used for very conserved domains– Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and

insertion-scores, derived from aligned sequence families; used for less conserved domains– Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.

Page 4: P rotein domain/family db

Some statistics• 15 most common domains for H. sapiens (Incomplete)

Immunoglobulin and major histocompatibility complex domain

Zinc finger, C2H2 typeEukaryotic protein kinaseRhodopsin-like GPCR superfamilyPleckstrin homology (PH) domainZinc finger, RING typeSrc homology 3 (SH3) domainRNA-binding region RNP-1 (RNA recognition motif)EF-hand familyHomeobox domainKrab boxPDZ domain (also known as DHR or GLGF)Fibronectin type III domainEGF-like domainCadherin domain…

http://www.ebi.ac.uk/proteome/HUMAN/interpro/top15d.html

Page 5: P rotein domain/family db

Protein domain/family db

PROSITE Patterns /ProfilesProDom Aligned motifsPRINTS Aligned motifsPfam HMM (Hidden Markov Models)

SMART HMMBLOCKS Aligned motifs

InterPro

Page 6: P rotein domain/family db

Prosite

Created in 1988 (SIB) Contains functional domains fully annotated, based on two methods:

patterns and profiles

Entries are deposited in PROSITE in two distinct files: Pattern/profiles with the list of all matches in SWISS-PROT Documentation

Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.

Page 7: P rotein domain/family db

Diagnostic performance

List of matches

Page 8: P rotein domain/family db

Prosite (profile): example

Page 9: P rotein domain/family db

PFAM (HMMs): an entry

Page 10: P rotein domain/family db

Page 11: P rotein domain/family db

PFAM (HMMs): query output

Page 12: P rotein domain/family db

HMMs

Page 13: P rotein domain/family db

Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it

Page 14: P rotein domain/family db

ProDom• consists of an automated compilation of

homologous domain alignment.

• August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000

Page 15: P rotein domain/family db

ProDom: query output example

Your query

Page 16: P rotein domain/family db

Protein domain/family: Composite databases

Example: InterPro

• Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites;

• Single set of documents linked to the various methods;• Will be used to improve the functional annotation of

SWISS-PROT (classification of unknown protein…)

• This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.

Page 17: P rotein domain/family db
Page 18: P rotein domain/family db
Page 19: P rotein domain/family db