ebi is an outstation of the european molecular biology laboratory. interpro database protein...
Post on 21-Dec-2015
220 views
TRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
InterPro Database Protein
Functional Analysis
Jennifer McDowall, Ph.D.Senior InterPro Curator
http://www.ebi.ac.uk/interpro
EBI Sequence Databases
UniProtKBSwiss-Prot
manual annotation
UniProtKBTrEMBL
protein sequence
translate
(GenBank, DDBJ)
nucleotide sequence
EMBL
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
>7M
>400,000
http://www.ebi.ac.uk/interpro
EBI Sequence Databases
UniProtKBSwiss-Prot
manual annotation
UniProtKBTrEMBL
protein sequence
translate
InterPro
Protein signatures
protein annotation
(GenBank, DDBJ)
nucleotide sequence
EMBL
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
groups of related proteins
(same family or share
domains)
http://www.ebi.ac.uk/interpro
UniProtKB
UniProt/ SwissProt proteins
InterPro ~370,000
~400,000
Signature matches
InterPro ~80% Protein Coverage
UniMESS Metagenomic
proteins
>6M
Available 2009
UniProt/ TrEMBL
proteins
>5.3M
>7M
http://www.ebi.ac.uk/interpro
What are protein signatures?
Multiple sequence alignment
• A signature describes the pattern of a set of conserved residues in a group of proteins
Define a protein family Define a protein feature (domain or conserved site)
http://www.ebi.ac.uk/interpro
• More sensitive homology searches Find more distant homologues than BLAST
What value are signatures?
http://www.ebi.ac.uk/interpro
• More sensitive homology searches
What value are signatures?
• Classification of proteins Associate proteins that share: Function
Domains
Sequence
Structure
http://www.ebi.ac.uk/interpro
What value are signatures?
• Annotation of protein sequences Define conserved regions of a protein
- e.g. location and type of domains
key structural or functional sites
• Classification of proteins
• More sensitive homology searches
http://www.ebi.ac.uk/interpro
What value are signatures?
• Transfer additional (automatic) annotation Associate TrEMBL proteins with well-annotated SwissProt proteins
Transfer annotation
• More sensitive homology searches
• Classification of proteins
• Annotation of protein sequences
http://www.ebi.ac.uk/interpro
Signature methods
• Pattern
• Fingerprint
• Sequence clustering
• HMM
• SAM
http://www.ebi.ac.uk/interpro
Patterns
Pattern/motif in sequence regular expression
Can define important sites
Enzyme catalytic site Prosthetic group attachment Metal ion binding site Cysteines for disulphide bonds Protein or molecule binding
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |
EXAMPLE: Insulin
http://www.ebi.ac.uk/interpro
Patterns
Pattern/motif in sequence regular expression
Can define important sites
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |
EXAMPLE: PS00262 Insulin family signature
http://www.ebi.ac.uk/interpro
Patterns
Pattern/motif in sequence regular expression
Can define important sites
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |
EXAMPLE: PS00262 Insulin family signature
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N
http://www.ebi.ac.uk/interpro
Patterns
Pattern/motif in sequence regular expression
Can define important sites
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |
EXAMPLE: PS00262 Insulin family signature
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C
Regular expression
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N
http://www.ebi.ac.uk/interpro
Patterns – understanding a regular expression
C - C - {P} - x(2) - C - [STDNEKPI] - x(3) - [LIVMFS] - x(3) - C
Strictly conserved site; only one amino acid is
accepted at this position
Strictly conserved site; only one amino acid is
accepted at this position
Curly brackets denote amino acids that cannot occur at a single position
Curly brackets denote amino acids that cannot occur at a single position
x denotes any amino acid can occur at a
single position
x denotes any amino acid can occur at a
single position
There are dashes between each position
There are dashes between each position
http://www.ebi.ac.uk/interpro
Patterns – understanding a regular expression
C - C - {P} - x(2) - C - [STDNEKPI] - x(3) - [LIVMFS] - x(3) - C
X(2) – therefore any amino acid can occur
at the next two position
X(2) – therefore any amino acid can occur
at the next two position
Square brackets denote range of amino acids that occur at a single position
Square brackets denote range of amino acids that occur at a single position
http://www.ebi.ac.uk/interpro
Patterns
Extract pattern sequencesxxxxxxxxxxxxxxxxxxxxxxxx
Sequence alignment
Insulin family motifDefine pattern
Pattern signature
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-CBuild regular expression
PS00000
http://www.ebi.ac.uk/interpro
Fingerprints
Several motifs characterise family
Different combinations of motifs describe subfamilies
Identify small conserved regions in divergent proteins
EXAMPLE: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA: MEKKEFHIVAETGIHARPATLLVQTASKFNSDINLEYKGKSVNLK
SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE
http://www.ebi.ac.uk/interpro
Fingerprints
Several motifs characterise family
Different combinations of motifs describe subfamilies
Identify small conserved regions in divergent proteins
EXAMPLE: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEYKGKSVNLK
SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE
His phosphorylation site
http://www.ebi.ac.uk/interpro
Fingerprints
Several motifs characterise family
Different combinations of motifs describe subfamilies
Identify small conserved regions in divergent proteins
EXAMPLE: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA:
His phosphorylation site
Ser phosphorylation site
MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEY KGKSVNLK
SIMGVMSL GVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE
http://www.ebi.ac.uk/interpro
Fingerprints
Several motifs characterise family
Different combinations of motifs describe subfamilies
Identify small conserved regions in divergent proteins
EXAMPLE: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA:
His phosphorylation site
Ser phosphorylation siteConserved site
MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK
SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE
http://www.ebi.ac.uk/interpro
Fingerprints
Several motifs characterise family
Different combinations of motifs describe subfamilies
Identify small conserved regions in divergent proteins
EXAMPLE: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK
SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE
1) GIHARPATLLVQTASKF2) KGKSVNLKSIMGVMSL
3) LGVGQGSDVTITVDGADE 3-motif fingerprint
http://www.ebi.ac.uk/interpro
Fingerprints
Extract motif sequences
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
Sequence alignment
Correct order
Correct spacing
Ser phosphorylation
site
Conserved site
His phosphorylation
siteDefine motifs
Fingerprint signature 1 2 3
PR00000
http://www.ebi.ac.uk/interpro
Sequence clustering
Automatic clustering of homologous domains
**Rarely covers entire domain (conserved core)
**Signature size can change with release
Known domain families
Recruit homologous domains
PSI-BLAST
MKDOM2
Automatic clustering
ProDomAlignAlign domain families
http://www.ebi.ac.uk/interpro
Hidden Markov Models (HMM)
Can characterise protein over entire length
Models conserved and divergent regions (position-specific scoring)
Models insertions and deletions
Outperform in sensitivity and specificity
More flexible (can use partial alignments)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
Sequence alignment
Scoring matrix
(residue frequency at each position in
alignment)
Profile
Hidden Markov Models (HMM)
Bayesian statistics
probability scoring
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
M = match state
M1
Hidden Markov Models (HMM)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
M1
Hidden Markov Models (HMM)
M2
M = match state
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
M1
Hidden Markov Models (HMM)
M2 M3
M = match state
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
M1
Hidden Markov Models (HMM)
M2 M3 M4 M5 M6 M7 M8 M9 M10M4 M5 M6 M7 M8 M9 M10
M = match state
http://www.ebi.ac.uk/interpro
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10M4 M5 M6 M7 M8 M9 M10
I = insert state
I1 I2 I3 I4 I5 I6 I7 I8 I9
D = delete state
D2 D3 D4 D5 D6 D7 D8 D9
Hidden Markov Models (HMM)
http://www.ebi.ac.uk/interpro
Hidden Markov Models (HMM)
HMM databases:
• PIR SUPERFAMILY
• PANTHER
• TIGRFAM
• PFAM
• SMART
• SUPERFAMILY
• GENE3D
Domains conserved in sequence
Families conserved in sequence
Domains conserved in structure
http://www.ebi.ac.uk/interpro
SAM Profile HMMs
Homologous structural superfamilies
Start with single seed sequence
Proteins in superfamily may have low
sequence identity
Few proteins in family have PDB structures
Create 1 model for every protein in superfamily combine results
http://www.ebi.ac.uk/interpro
SAM Profile models
T99 script:
Low identity matches
Close homologues
WU-BLASTP
search
Final model
Single seed sequenceGIHARPATLLVQTASKF
Initial model
GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF
New larger alignmentGIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF
http://www.ebi.ac.uk/interpro
Signatures Methods
• Pattern
• Fingerprint
• Sequence clustering
• HMM
• SAM
Describe protein features:active sites, binding sites…
Describe families and sibling subfamilies
Predicts conserved domains
http://www.ebi.ac.uk/interpro
Signature Methods
• Pattern
• Fingerprint
• Sequence clustering
• HMM
• SAM
Functional classification of
families
Functional domain annotation
Structural domain annotation
http://www.ebi.ac.uk/interpro
Comprehensive annotationInterPro removes
redundancy
SWIB/MDM2 domain
RanBP2-type zinc finger
RING-type zinc fingerDomain annotation
http://www.ebi.ac.uk/interpro
Comprehensive annotation
Conserved site within zinc finger
Annotate features
http://www.ebi.ac.uk/interpro
Comprehensive annotation
Mdm2/Mdm4 family
Mdm4 subfamily
Parent
Child
Family classification
http://www.ebi.ac.uk/interpro
Domain Boundaries
Gene3D (and SSF) determines domain structural boundaries
Pfam trims domains to regions of good sequence conservation
ProDom displays shortest conserved sequence
http://www.ebi.ac.uk/interpro
Fragmented Signatures
4) Non-contiguous domains
3) Repeated elements
2) Duplicated domains
1) Signature method
http://www.ebi.ac.uk/interpro
Fragmented Signatures
• e.g. PRINTS – discrete motifs1) Signature methodSignature method
3) Repeated elements
2) Duplicated domains
4) Non-contiguous domains
http://www.ebi.ac.uk/interpro
Fragmented Signatures
1) Signature method
2) Duplicated domainsDuplicated domains
3) Repeated elements
4) Non-contiguous domains
• e.g. SSF - duplication consisting of 2 domains with same fold
http://www.ebi.ac.uk/interpro
Fragmented Signatures
3) Repeated elementsRepeated elements
2) Duplicated domains
• e.g. Kringle, WD40
4) Non-contiguous domains
1) Signature method
http://www.ebi.ac.uk/interpro
Fragmented Signatures
3) Repeats
4) Non-contiguous domainsNon-contiguous domains
2) Duplicated domains
1) Signature method
• Structural domains can consist of non-contiguous sequence
http://www.ebi.ac.uk/interpro
Fragmented Signatures
4) Non-contiguous domains
3) Repeats
2) Duplicated domains
1) Signature method
http://www.ebi.ac.uk/interpro
Complementary Annotation
Sequence-based signature (Pfam) shows that the domain is made up of repeating sequence elements
Beta-propeller repeat
Structural-based signature (SSF) shows boundaries of structural domain
7-blade beta-propeller
http://www.ebi.ac.uk/interpro
Complementary Annotation
PFAM shows domain is composed of two types of repeated sequence motifs
SUPERFAMILY shows the potential domain boundaries
http://www.ebi.ac.uk/interpro
Complementary Annotation
GENE3D shows that these domains share homologous structure
PFAM/SMART show 2 domains from distinct
sequence families
http://www.ebi.ac.uk/interpro
Searching InterPro
http://www.ebi.ac.uk/interpro/
Search tools include:
• Text Search
• InterProScan (sequence search)
http://www.ebi.ac.uk/interpro
InterPro Text Search
Text search box Search using:• text• protein ID• InterPro ID• GO term
Search results
Direct links to entry
http://www.ebi.ac.uk/interpro
InterProScan Search Use ftp site to run multiple sequences
simultaneously
Member database search engines
Paste in sequence (protein/nucleotide)
http://www.ebi.ac.uk/interpro
InterProScan Search Results
single InterPro entry
Direct links to entry
Direct links to signature databases
http://www.ebi.ac.uk/interpro
InterPro Entry
Groups similar signatures together
Adds extensive annotation
Linked to other databases
Structural information and viewers
Links related signatures
http://www.ebi.ac.uk/interpro
Grouping Signatures Together
Same positions
Different protein hits2)
PFAM
PROSITE (100)
(50)
PFAM
PROSITE1) (100)
(100)Same positionsSame protein hits
IPR000001
IPR000001
IPR000002
IPR000001
IPR000002
IPR000001
IPR000002
Different positions4)PFAM
PROSITE (100)
(100)
PROSITE
PFAM
3) (100)
(100)
Different positions
Same protein hits
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
Short names appear in UniProt entries
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
Domain Biological units with defined boundaries
Full-length signatures grouping related proteins Family
Region Any signature that doesn’t fit the above
Repeat
Site
Signature repeated as a series of short motifs
Protein feature described by a Prosite pattern
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Extensive Annotation
Annotation Fields in InterPro
• Name and short name• List of signatures (links to member databases)
• Entry type (family, domain, site)
• Relationships (links related signatures)
• GO mapping ( large scale classification)
• Abstract • Taxonomy (search/download using taxonomy)
• Examples• Publications
http://www.ebi.ac.uk/interpro
Links to Other Databases
Additional annotation from databases:
• Blocks (family alignments)
• IntEnz (enzymes)
• Prosite documents• COME (bioinorganic motifs)
• CAZy (carbohydrate-active enzymes)
• IUPHAR (GPCR receptors)
• CluS-Tr (protein clusters)
• Pandit (phylogenetic trees of PFAMs)
• Merops (peptidases & inhibitors)
http://www.ebi.ac.uk/interpro
Links to Structural Databases
• SCOP (structural classification of proteins)
• CATH (structural classification of proteins)
• PDB (protein structure databank)
List of proteins with structural data
PDB database of structures
http://www.ebi.ac.uk/interpro
Links to Structural Databases
• SCOP (structural classification of proteins)
• CATH (structural classification of proteins)
• PDB (protein structure databank)
Links to structural classification
http://www.ebi.ac.uk/interpro
Links to Structural Databases
• SCOP (structural classification of proteins)
• CATH (structural classification of proteins)
• PDB (protein structure databank)
Links to structural classification
http://www.ebi.ac.uk/interpro
Links to Interaction Databases
• IntAct (protein-protein interactions)
Lists proteins in entry known to be involved in protein-protein interactions
IntAct database of interactions
http://www.ebi.ac.uk/interpro
InterPro Relationships
Parent/Child
Contains/Found in
Hierarchical subdivision into more closely related groups
Domain/subdomain composition Overlapping Remaining relationships
http://www.ebi.ac.uk/interpro
Link related signatures - relationships
1) Parent - Child (subgroup of more closely related proteins)
PFAM
(75)
(100)
SMART
Protein kinase
Serine kinase
PROSITE (25) Tyrosine kinase
*
PFAM (100) Protein kinase*
No proteins in common
SMART PROSITE
Parent
Children
PFAM
Protein kinase
SMART PROSITE
Serine kinase Tyrosine kinase
(IPR000001)
(IPR000002) (IPR000003)
http://www.ebi.ac.uk/interpro
Relationships – evolutionary context
GENE3D Grandparent
Parents
Children
InterPro Relationship
Criteria for Signature
Structural family
PFAM PFAMSequence families
TIGRFAM TIGRFAM TIGRFAM TIGRFAMFunctional families
Unique to InterPro
http://www.ebi.ac.uk/interpro
IPR011009 Protein kinase-like
IPR000403 PI 3/4 kinase
IPR000719 Protein kinase
IPR001245 Tyr kinase
IPR017442 Ser/Thr kinase-rel
IPR015772TNK1 kin
IPR015783ATMRK kin
IPR002575 APH kinase
IPR004147 ABC-1
IPR004166 EF2 kinase
IPR015275 Actin-fragmin kin
IPR015897 CHK kinase
IPR002290 Ser/Thr kin
IPR015515 GCN2
IPR015771 Hrmn Rcpt
IPR015768 Activin Rcpt
IPR015769 TGFb2 Rcpt
IPR015770 BMPRII
IPR015785 MAPK3 kin
IPR015787 IL1 kin
IPR008350 ERK3 MAPK
IPR015732 PSKH kin
IPR015733 Ca-dep kin4
IPR015734 Ca-dep kin1
IPR015739 Leu zip kin
IPR015740 Plant kin
IPR015747 MAPKKK4
IPR015748MAPKKK3
IPR015749 MAPKKK1
IPR015750 Pak kin
IPR015730 Myosin kin
IPR008351 JNK kin
Example hierarchy:
IPR018934 RIO-like kin
IPR000687 RIO kin
IPR002573 Choline kinase
IPR008349 ERK1 kin
IPR006748 Hydroxyurea kin
IPR009212 MethylTR kin
IPR014093 Thiamine kin
IPR009330 Lipopoly syn
IPR004119 DUF
IPR012877 Put kinase
http://www.ebi.ac.uk/interpro
Most specific subfamily
classification
Superfamily classification
Parent/child – evolutionary context
http://www.ebi.ac.uk/interpro
2) Contains – Found in
PROSITE C-terminal domainSMARTN-terminal domain
PFAM Receptor family
PFAM
Receptor Family
SMART PROSITE
N-terminal domain C-terminal domainFound in(Pfam)
Contains (Smart and Prosite)
Link related signatures - relationships
(Describes domain composition)
http://www.ebi.ac.uk/interpro
2) Contains – Found in
Link related signatures - relationships
Coverage Signature must cover the entire (>90%) sequence of contained signature
PFAM
SMART
ContainsFound in
PFAM
SMART
Contains
Found in
http://www.ebi.ac.uk/interpro
3) Overlapping
Link related signatures - relationships
All remaining relationships
PROSITE
SMART Overlapping
http://www.ebi.ac.uk/interpro
Structural information
PDB
Classification
Structures
CATH
SCOP
Homology Models
Swiss-Model
ModBase
http://www.ebi.ac.uk/interpro
Structural information
CATH and SCOP divide PDB structures into domains
Swiss-Model and ModBase predict structure for regions not covered by PDB
Note that one domain is discontiguous
http://www.ebi.ac.uk/interpro
Sequence-Structure Display
Signatures predictive of
protein annotation
Structural data for specific proteins
AstexViewer® for structure
http://www.ebi.ac.uk/interpro
Structure Viewer
Navigate between structure and sequence
Manipulate structures
http://www.ebi.ac.uk/interpro
Other Features – domain architecture
Select data set of these proteins
Each ‘balloon’ represents a
linked InterPro domain
http://www.ebi.ac.uk/interpro
Protein Sequence Coverage
InterPro signatures cover:
95% of UniProt/Swiss-Prot proteins
79% of UniProt/TrEMBL proteins
>5 million matches in InterPro
~17,000 InterPro entries
>57,500 signature methods