class 3 2009
DESCRIPTION
Class 3 2009. European Resources Protein Focused. Protein Databases. EBI – European Bioinformatics Institute http://www.ebi.ac.uk/. What is the difference between dealing with nucleotide DBs and protein DBs?. Name & description Gene encoded from Organism Function (only one?) - PowerPoint PPT PresentationTRANSCRIPT
Class 3 2009
European Resources
Protein Focused
Protein Databases
EBI – European Bioinformatics Institute
http://www.ebi.ac.uk/
What is the difference between dealing
with nucleotide DBs and protein DBs?
Protein information• Name & description
• Gene encoded from
• Organism
• Function (only one?)
• Enzyme?
• Ligands?
• PTMs?
• Interactions?
• Biological processes.
• Structure.
• Sequence.
• Localization
• More...
Protein DB -short history
Pre-UniProt
Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI;
TrEMBL: created at the EBI in 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot.
It was introduced to deal with the increased data flow from genome projects
PIR
EBI
SIB
The three-layered approach
The UniProt Archive (UniParc)•UniProtKB + all other protein sequences publicly available•Completeness
The UniProt Reference Clusters (UniRef)•Non-redundant views of UniProtKB + selected UniParcsets•Speed
The UniProt Knowledgebase (UniProtKB)•Central database of annotated protein sequences and functional information•UniProtKB/Swiss-Prot + UniProtKB/TrEMBL
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
Databases growth
www.genome.jp/en/db_growth.html
Protein DBs• Swiss-Prot - manually annotated
2005- ~100,000 2009 - ~400,000
.
• TrEMBL – translated EMBL, automatically
annotated.
Protein NamesDifferent DBs – different accessions
DB Accessions
TrEMBL P12345
Swiss-Prot (to be changed..) MAPK_HUMAN
RefSeq NP_123456
XP_123456
UniRef UniRef100_P99999
UniRef90_P99999
UniRef50_P99999
Ensembl ENSP00000123456
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
Principles
More in UniProt a complete annotated protein sequence database
UniProt The Universal Protein Resource for protein sequences.
UniProt Archive A non-redundant archive of protein sequences extracted from public databases and contains only protein sequences.
UniProt/UniRef Features clustering of similar sequences to yield a representative subset of sequences. This produces very fast search times.
UniProt/UniMES A repository specifically developed for metagenomic and environmental data.
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
How is it built?
http://beta.uniprot.org/
What’s in UniProt?
EBI interface
PIR – Protein Information Resource
Protein Family Classification System
Integrated
Protein
Knowledgebase
Integrated Protein Literature, Information and Knowledge
END
If you got lost…(class exercise)
some more slides…
EB-eye search
EB-eye search
NCBI - Entrez