class 3 2009 european resources protein focused. protein databases ebi – european bioinformatics...

27
Class 3 2009 European Resources Protein Focused

Post on 21-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Class 3 2009

European Resources

Protein Focused

Protein Databases

EBI – European Bioinformatics Institute

http://www.ebi.ac.uk/

What is the difference between dealing

with nucleotide DBs and protein DBs?

Protein information• Name & description

• Gene encoded from

• Organism

• Function (only one?)

• Enzyme?

• Ligands?

• PTMs?

• Interactions?

• Biological processes.

• Structure.

• Sequence.

• Localization

• More...

Protein DB -short history

Pre-UniProt

Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI;

TrEMBL: created at the EBI in 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot.

It was introduced to deal with the increased data flow from genome projects

PIR

EBI

SIB

The three-layered approach

The UniProt Archive (UniParc)•UniProtKB + all other protein sequences publicly available•Completeness

The UniProt Reference Clusters (UniRef)•Non-redundant views of UniProtKB + selected UniParcsets•Speed

The UniProt Knowledgebase (UniProtKB)•Central database of annotated protein sequences and functional information•UniProtKB/Swiss-Prot + UniProtKB/TrEMBL

Protein DBs• Swiss-Prot - manually annotated.

• TrEMBL – translated EMBL, automatically annotated.

• UniProtKB – The UniProt Knowledge

• UniParc – The Achieve pf UniProt

• PIR - Protein Information Resource

• UniRef – The UniProt Reference Clusters

• PDB – Protein Data Bank – structure

• PRIDE – Resource for experimental proteomics (not in this

class)

Databases growth

www.genome.jp/en/db_growth.html

Protein DBs• Swiss-Prot - manually annotated

2005- ~100,000 2009 - ~400,000

.

• TrEMBL – translated EMBL, automatically

annotated.

Protein NamesDifferent DBs – different accessions

DB Accessions

TrEMBL P12345

Swiss-Prot (to be changed..) MAPK_HUMAN

RefSeq NP_123456

XP_123456

UniRef UniRef100_P99999

UniRef90_P99999

UniRef50_P99999

Ensembl ENSP00000123456

Protein DBs• Swiss-Prot - manually annotated.

• TrEMBL – translated EMBL, automatically annotated.

• UniProtKB – The UniProt Knowledge

• UniParc – The Achieve pf UniProt

• PIR - Protein Information Resource

• UniRef – The UniProt Reference Clusters

• PDB – Protein Data Bank – structure

• PRIDE – Resource for experimental proteomics (not in this

class)

Principles

More in UniProt a complete annotated protein sequence database

UniProt The Universal Protein Resource for protein sequences.

UniProt Archive A non-redundant archive of protein sequences extracted from public databases and contains only protein sequences.

UniProt/UniRef Features clustering of similar sequences to yield a representative subset of sequences. This produces very fast search times.

UniProt/UniMES A repository specifically developed for metagenomic and environmental data.

Protein DBs• Swiss-Prot - manually annotated.

• TrEMBL – translated EMBL, automatically annotated.

• UniProtKB – The UniProt Knowledge

• UniParc – The Achieve pf UniProt

• PIR - Protein Information Resource

• UniRef – The UniProt Reference Clusters

• PDB – Protein Data Bank – structure

• PRIDE – Resource for experimental proteomics (not in this

class)

How is it built?

http://beta.uniprot.org/

What’s in UniProt?

EBI interface

PIR – Protein Information Resource

Protein Family Classification System

Integrated

Protein

Knowledgebase

Integrated Protein Literature, Information and Knowledge

END

If you got lost…(class exercise)

some more slides…

EB-eye search

EB-eye search

NCBI - Entrez