p rotein domain/family db

Post on 01-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

P rotein domain/family db. Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO) - PowerPoint PPT Presentation

TRANSCRIPT

Protein domain/family db

• Secondary databases are the fruit of analyses of the sequences found in the primary sequence db

• Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO)

• Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)

Protein domain/family

• Most proteins have « modular » structures• Estimation: ~ 3 domains / protein• Domains (conserved sequences or structures) are identified by

multiple sequence alignments

• Domains can be defined by different methods: – Pattern (regular expression); used for very conserved domains– Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and

insertion-scores, derived from aligned sequence families; used for less conserved domains– Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.

Some statistics• 15 most common domains for H. sapiens (Incomplete)

Immunoglobulin and major histocompatibility complex domain

Zinc finger, C2H2 typeEukaryotic protein kinaseRhodopsin-like GPCR superfamilyPleckstrin homology (PH) domainZinc finger, RING typeSrc homology 3 (SH3) domainRNA-binding region RNP-1 (RNA recognition motif)EF-hand familyHomeobox domainKrab boxPDZ domain (also known as DHR or GLGF)Fibronectin type III domainEGF-like domainCadherin domain…

http://www.ebi.ac.uk/proteome/HUMAN/interpro/top15d.html

Protein domain/family db

PROSITE Patterns /ProfilesProDom Aligned motifsPRINTS Aligned motifsPfam HMM (Hidden Markov Models)

SMART HMMBLOCKS Aligned motifs

InterPro

Prosite

Created in 1988 (SIB) Contains functional domains fully annotated, based on two methods:

patterns and profiles

Entries are deposited in PROSITE in two distinct files: Pattern/profiles with the list of all matches in SWISS-PROT Documentation

Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.

Diagnostic performance

List of matches

Prosite (profile): example

PFAM (HMMs): an entry

PFAM (HMMs): query output

HMMs

Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it

ProDom• consists of an automated compilation of

homologous domain alignment.

• August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000

ProDom: query output example

Your query

Protein domain/family: Composite databases

Example: InterPro

• Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites;

• Single set of documents linked to the various methods;• Will be used to improve the functional annotation of

SWISS-PROT (classification of unknown protein…)

• This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.

top related