“proteomics & bioinformatics”

81
“Proteomics & Bioinformatics” MBI, Master's Degree Program in Helsinki, Finland 10 May, 2007 Sophia Kossida, BRF, Academy of Athens, Greece Esa Pitkänen, Univeristy of Helsinki, Finland Juho Rousu, University of Helsinki, Finland Lecture 4

Upload: kimberley-hanson

Post on 31-Dec-2015

83 views

Category:

Documents


3 download

DESCRIPTION

“Proteomics & Bioinformatics”. MBI, Master's Degree Program in Helsinki, Finland. Lecture 4. 10 May, 2007. Sophia Kossida , BRF, Academy of Athens, Greece Esa Pit känen , Univeristy of Helsinki, Finland Juho Rousu , University of Helsinki, Finland. Proteomics and biology /Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “Proteomics & Bioinformatics”

“Proteomics & Bioinformatics”

MBI, Master's Degree Program in Helsinki, Finland

10 May, 2007

Sophia Kossida, BRF, Academy of Athens, Greece

Esa Pitkänen, Univeristy of Helsinki, Finland

Juho Rousu, University of Helsinki, Finland

Lecture 4

Page 2: “Proteomics & Bioinformatics”

Proteome Mining

Identifying as many as possible of the proteins in your sample

Protein Expression Profiling

Identification of proteins in a particular sample as a function of a particular state of the organism or cell

Functional proteomics

Post-translational modifications

Identifying how and where the proteins are modified

Protein-protein interactions Protein-network mappingDetermining how the proteins interact with each other in living systems

Structural Proteomics

Protein quantitation or differential analysis

Proteomics and biology /Applications

Page 4: “Proteomics & Bioinformatics”

Identification

Quantification

General workflow of proteomics analysis

External data sourcestaxonomy, ontologies, bibliography…

Applications Systems biology (pathways, interactions..) biomarker-discovery, drug targets

Proteins/peptides

2D gel image aquisition and storage

MALDI, MS/MS Store peak lists and all meta data

Digestion and/or separation

PMF

MS/MS

DIGE

LC-MS & Tags

Page 5: “Proteomics & Bioinformatics”

Sequence data bases:EMBL Nucleotide Sequence Database GenBank UniProtKB/Swiss-Prot & TrEMBL Ensemble EST database PIR

Identification

Quantification

General workflow of proteomics analysis

Proteins/peptidesDigestion and/or separation

MALDI, MS/MS

2D Page data bases

Swiss 2D PAGE, Gelbank, Cornelia, WordPAGE

Make 2D

Imaging tools:Melanie, PDQuest ProgenesisDelta 2D

Storing/ organising:Proteincsape

MSight

KEGG PDB DIPOMIMReactomePROSITPfamSPINBONDSTRINGAmiGODavidPubMedMEDLINE

MascotSequestAldentePopitamPhenyxFindModProfoundPepFragMS-FitOMSSASearch XLinksTagIdent

Page 6: “Proteomics & Bioinformatics”

General workflow of proteomics analysis

Proteins/peptidesDigestion and/or separation

2D Page data bases

Make 2D

Imaging Softwares:The ability to compare two gels (images) and then identify differently expressed spots

•Melanie•PDQuest•Progenesis•Delta 2D

Proteinscape –platform for storing, organizing dataMSight -representation of mass spectra along with data from the separation

2D gel databases:Data integration on the webImage data and textual information

•Swiss 2D PAGE •Gelbank •Cornelia•WordPAGE

Page 7: “Proteomics & Bioinformatics”

2D Gel Databases

 Swiss-2DPAGE www.expasy.ch

GelBank http://www.gelscape.ualberta.ca:8080/htm/gdbIndex.html 

Cornea 2D-PAGE http://www.cornea-proteomics.com/

World 2DPAGE, Index of 2D gel databaseshttp://ca.expasy.org/ch2d/2d-index.html

Page 8: “Proteomics & Bioinformatics”

Swiss 2D PAGE viewer

Page 9: “Proteomics & Bioinformatics”

Gel bank

Page 10: “Proteomics & Bioinformatics”

Cornea

Page 11: “Proteomics & Bioinformatics”

World-2DPAGE

http://ca.expasy.org/ch2d/2d-index.html

Page 12: “Proteomics & Bioinformatics”

It runs on most UNIX-based operating systems (Linux, Solaris/SunOS, IRIX). Being continuously developed, the tool is evolving in concert with the current Proteomics Standards Initiative of the Human Proteome Organization (HUPO).

Data can be marked to be public, as well as fully or partially private.

An administration Web interface, highly secured, makes external data integration, data export, data privacy control, database publication and versions' control a very easy task to perform.

A software package to create, convert, publish, interconnect and keep up to date 2DE-databases. Provided by ExPASY

The database is queryable via description, accession or spot clicking.

Cross-references are provided to other federated 2D PAGE database entries, Medline and SWISS-PROT

Entries are linked to images showing the experimentally determined and theoretical protein locations.

Search via –clickable images, -keywords

Make 2D database

Page 13: “Proteomics & Bioinformatics”

Federated database

Limitations of current databases:Do not contain strict/detailed descriptions of protocol (buffers, sample volume, staining techniques all important information for gel comparisons).Designed as 2D (and not proteomics) databases and therefore not readily expandable to incorporate other proteomics data e.g. MS, MDLC.Designed for reference gels, not on-going projects.

Robustness Consistency Maintenance of the databaseData quality

A collection of databases that are treated as one entity and viewed through a single user interface (pc.mag.com)

Page 14: “Proteomics & Bioinformatics”

Guidelines for building a federated 2-DE database

http://ca.expasy.org/ch2d/fed-rules.html

Individual entries in the database must be accessible by a keyword search. Other methods are possible but not required.

The database must be linked to other databases by active hypertext cross-references, linking together all related databases. Database entries must be at least linked to the main index.

A main index has to be supplied that provides a means of querying all databases through one unique query point.

Individual protein entries must be available through clickable images.2DE analysis software designed for use with federated databases, must be able to access individual entries in any federated 2DE databases.

for a complete reference, see Appel et al., Electrophoresis

17, 1996, 540-546, 1996):

Page 15: “Proteomics & Bioinformatics”

Image analysis software

ImageMaster2D/ Melanie

PDQuest (Bio-Rad, USA)

Progenesis (Nonlinear, UK)

Delta2D (Decodon, Germany)

Page 20: “Proteomics & Bioinformatics”

Delta 2D

http://www.decodon.com/Solutions/Delta2D/

Page 21: “Proteomics & Bioinformatics”

ProteinScape

• Hierarchy:

Project

Sample

Gel

Spots

MS Data

Search Events

Platform for storing, organizing, analyzing data generated during the proteomics workflow.

Page 22: “Proteomics & Bioinformatics”

MSightSpecifically developed for the representation of mass spectra along with data from the separation

http://www.expasy.org/MSight

Page 23: “Proteomics & Bioinformatics”

Sequence data bases:EMBL Nucleotide Sequence DatabaseGenBank UniProtKB/Swiss-Prot & TrEMBLEnsembleEST databasePIR

Identification

Quantification

General workflow of proteomics analysis

MALDI, MS/MS Store peak lists and all meta data

PMF

MS/MS

DIGE

LC-MS & Tags

Proteins/peptides

2D gel image aquisition and storage

Digestion and/or separation

Page 24: “Proteomics & Bioinformatics”

EMBL Nucleotide Sequence Database

Collaboration between GenBank (USA) and DNA Database of Japan (DDBJ) and EBI.

New collected sequence data is exchanged, and each database is updated daily.

Page 25: “Proteomics & Bioinformatics”

EBI

Page 26: “Proteomics & Bioinformatics”

GenBank

Each entry includes a concise description of the sequence, the scientific name and the taxonomy of the source organism, and a table of features that identifies coding regions and other sites of biological significance, such as transcription units, sites of mutations or modifications and repeats.

Protein translations for coding regions are included in the feature table.

Bibliographic references are included along with a link to the Medline unique identifier for all published sequences.

Gen Bank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

http://www.psc.edu/general/software/packages/genbank/genbank.html

GenBank is available for searching at NCBI

Page 28: “Proteomics & Bioinformatics”

DDBJ

Page 29: “Proteomics & Bioinformatics”

INSDC

Page 30: “Proteomics & Bioinformatics”

UniProtUniversal Protein Resource

Joining the information contained in UniProtKB/Swiss-Prot, UniProteKB/TrEMBL and PIR.

It is comprised of three components

•UniProt Knowledge base (curated protein information, including function, classification, and cross-reference.

•UniProt Reference Clusters (combines closely related sequences into a single record to speed searches.)

•UniProt Archive (is a repository, reflecting the history of all protein sequences)

Page 31: “Proteomics & Bioinformatics”

ExPASy Proteomics Server

Expert Protein Analysis System

Proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2D-PAGE.

http://ca.expasy.org/

http://www.isb-sib.ch/

Page 32: “Proteomics & Bioinformatics”

UniProtKB/Swiss-Prot

The UniProt KB/Swiss-Prot Protein Knowledgebase is a annotated protein sequence database established in 1986. It is maintained collaboratively by the SIB (Swiss Institute of Bioinformatics) and the European Bioinformatics Institute (EBI)

http://ca.expasy.org/sprot/

Page 33: “Proteomics & Bioinformatics”

Swiss Prot

Page 34: “Proteomics & Bioinformatics”

TrEMBL

•Uni ProtKB/TrEMBL is a computer-annotated protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase.

•It contains the translations of all coding sequences (CDS) present in the EMBL/GenBank/DDBJ Nucleotide Sequence Databases and also protein sequences extracted from the literature or submitted to UniProtKB/Swiss-Prot.

•The database is enriched with automated classification and annotation.

Page 36: “Proteomics & Bioinformatics”

ESTdb

Expressed Sequence Tags, EST is a unique DNA sequence within a coding region of a gene that is useful for identifying full-length genes and serves as a landmark for mapping.

The dbEST is a division of GenBank that contains sequence data and other information on “singke-pass” cDNA sequences, from a number of organisms.

http://www.ncbi.nlm.nih.gov/dbEST/

Page 37: “Proteomics & Bioinformatics”

Ensemble

http://www.ebi.ac.uk/ensembl/

Ensemble is a joint project between the EMBL-EBI and the Welcome Trust Sanger Institute that aims at developing a system that maintains automatic annotation of large eukaryotic genomes. Access to all the software and data is free and without constraints of any kind.

Page 38: “Proteomics & Bioinformatics”

IPI- International Protein Index

Page 39: “Proteomics & Bioinformatics”

Identification

Quantification

General workflow of proteomics analysis

MALDI, MS/MS Store peak lists and all meta data

PMF

MS/MS

DIGE

LC-MS & Tags

Proteins/peptides

2D gel image aquisition and storage

Digestion and/or separation

MascotSequestAldentePopitamPhenyxFindModProfoundPepFragMS-FitOMSSASearch XLinksTagIdent

Page 41: “Proteomics & Bioinformatics”
Page 42: “Proteomics & Bioinformatics”

PROWL

Page 43: “Proteomics & Bioinformatics”

Identification and Characterization Tools

Mascot (Matrix Science)

Aldente (ExPasy)

Profound (Rockefeller University)

MS-Fit (Prospector; UCSF)

Sequest

Mascot

OMSSA

X!Hunter

PMFdata MS/MS data

Page 44: “Proteomics & Bioinformatics”

Identification and Characterization Tools

Popitam (ExPASy, SIB)

Phenyx –GeneBio, Swizerland)

PepFrag (Rockefeller University, USA)

SearchXLinks – (Caesar, Germany)

Page 45: “Proteomics & Bioinformatics”

Popitam

Popitam is designed to characterize peptides withunexpected modification (e.g. post-translational modifications or mutations) by tandem mass spectrometry (ExPASy, SIB)

http://expasy.org/cgi-bin/popitam/help.pl

Page 46: “Proteomics & Bioinformatics”

Popitam results

Page 47: “Proteomics & Bioinformatics”

Phenyx

Phenyx is a software platform for the identification and characterization of proteins and peptides from mass spectrometry data.

Developed by GeneBio in collaboration with SIB

http://www.phenyx-ms.com/about/about_phenyx.html

Page 48: “Proteomics & Bioinformatics”

PEPFRAG

http://prowl.rockefeller.edu/

Searches known protein sequences with peptide fragment mass information

Page 49: “Proteomics & Bioinformatics”

SearchXLinks

http://www.searchxlinks.de/

Analysis of mass spectra of modified, cross-linked, and digested proteins, the amino acid of which is known

Page 50: “Proteomics & Bioinformatics”

Identification and Characterization Tools

FindMod predicts potential protein post-translational modifications (PTM) and finds potential single amino acid substitutions in peptides.

FindPept identifies peptides that result from unspecific cleavage of proteins from experimental masses, taking into account artefactual chemical modifications, posttranslational modifications (PTM) and protease autolytic cleavage.

GlycoMod predicts possible oligosaccharide structures that occur on proteins from their experimentally determined masses.

http://au.expasy.org/tools/findmod/

AACompIdent achieves identification with amino acid composition

TagIdent identifies proteins with isoelectric point, pI, molecular weight, MW, and sequence tag generating a list of proteins close to a given pI and Mw. Multident achieves cross-species identification with multiple parameters (pI, Mw, sequence tag and peptide mass fingerprinting data)

Page 51: “Proteomics & Bioinformatics”

Identification

Quantification

General workflow of proteomics analysis

MALDI, MS/MS Store peak lists and all meta data

PMF

MS/MS

DIGE

LC-MS & Tags

Proteins/peptides

2D gel image aquisition and storage

Digestion and/or separation

KEGG PDB DIPOMIMReactomePROSITPfamSPINBONDSTRINGAmiGODavidPubMedMEDLINE

Page 52: “Proteomics & Bioinformatics”

KEGG

http://www.genome.jp/kegg/kegg2.html

KEGG: Kyoto Encyclopedia of Genes and Genomes

•Organism specific entry points:

-KEGG Organisms

•Subject specific entry points:

-DRUG, GLYCAN, REACTION, KAAS

Page 53: “Proteomics & Bioinformatics”

KEGG

Manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for metabolism, other cellular processes, and human diseases.

Functional hierarchies and binary relations of KEGG objects, including genes and proteins, compounds and reactions, drugs and diseases, and cells and organisms.

Gene catalogs of all complete genomes and some partial genomes with ortholog annotation (KO assignment), enabling KEGG PATHWAY mapping and BRITE mapping.

A composite database of chemical substances and reactions representing our knowledge on the chemical repertoire of biological systems and environments.

KEGG is a “biological systems” database integrating both molecular building block information and higher-level systematic information.

Page 54: “Proteomics & Bioinformatics”

Search Pathway

Carbon fixation

Page 55: “Proteomics & Bioinformatics”

Search “Pathway”

Page 56: “Proteomics & Bioinformatics”

“Pathways” _motifs

Page 57: “Proteomics & Bioinformatics”

Reactome

Page 58: “Proteomics & Bioinformatics”

Reactome

Page 59: “Proteomics & Bioinformatics”

PubMed

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed

Page 60: “Proteomics & Bioinformatics”

David

http://david.abcc.ncifcrf.gov/home.jsp

Page 61: “Proteomics & Bioinformatics”

Protein Data Bank

http://www.rcsb.org/pdb/home/home.do

Provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease.

Page 62: “Proteomics & Bioinformatics”

OMIM

This database is a catalog of human genes and genetic disorders.

The database contains textual information and references. It also contains links to MEDLINE and sequence records

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

Page 63: “Proteomics & Bioinformatics”

Protein family classification

PROSITE (ExPASY)

Pfam (Sanger Institute)

SMART (EMBL)

Page 64: “Proteomics & Bioinformatics”

PrositA Pseudo-Rotational Online Service and Interactive Tool

Proteins can be grouped on the basis of their sequences, into a limited number of families.

Some regions have been better conserved than others during evolution. These regions are generally important for the function of a protein and/or the maintenance of the three- dimensional structure.

By analyzing the constant and variable properties of such groups of similar sequences, it is possible to derive a signature for a protein family or domain, which distinguishes its members from all other unrelated proteins.

http://au.expasy.org/prosite/

ww

Page 65: “Proteomics & Bioinformatics”

PROSIT

Page 66: “Proteomics & Bioinformatics”

PROSIT

Page 67: “Proteomics & Bioinformatics”

PROSIT

Page 68: “Proteomics & Bioinformatics”

Pfam

Multiple sequence alignments and HMMs of protein domains and families, at Sanger Institute.

http://www.sanger.ac.uk/Software/Pfam/help/index.shtml

Page 69: “Proteomics & Bioinformatics”

Browse interactions

Page 70: “Proteomics & Bioinformatics”

http://smart.embl-heidelberg.de/

Page 71: “Proteomics & Bioinformatics”

Structure data bases/interactions

STRING (EMBL)

BOND (Unleashed Informatics)

Cytoscape

DIP (UCLA)

iHOP

SPIN-PP (protein-protein interfaces in the PDB)

MIPS (Mammalian Protein-Protein Interaction

Database)

InterAct (protein interactions from literature curation)

Page 72: “Proteomics & Bioinformatics”

http://string.embl.de

STRING

Page 73: “Proteomics & Bioinformatics”

STRING search results

Page 74: “Proteomics & Bioinformatics”

STRING graphical

Page 75: “Proteomics & Bioinformatics”

STRING_ new node

Page 76: “Proteomics & Bioinformatics”

BONDBOND

http://bond.unleashedinformatics.com

The Biomolecular Object Network Databank

Page 77: “Proteomics & Bioinformatics”

Cytoscape

Cytoscape is an open source bioinformatics software platform for visualizing molecular interactions with gene expression profiles and other state data.

Page 78: “Proteomics & Bioinformatics”

Node label position can be controled by new GUI in VizMapper.

Page 79: “Proteomics & Bioinformatics”

Cytoscape_ plugins

Plugins available for network and molecular profile analysis.

for example:

•Filter the network•Find active subnetworks/ pathway modules•Find clusters

A tool to determine which Gene Ontology (GO) categories are statistically over respresented in a set of genes or a subgraph of a biological network.

Page 80: “Proteomics & Bioinformatics”

Database of Interacting Proteins

The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions.

http://dip.doe-mbi.ucla.edu/