introduction to proteomics susan liddell university of nottingham [email protected] pgt...
TRANSCRIPT
Introduction to ProteomicsSusan Liddell
University of [email protected]
PGT short course 2014UoN Graduate School Course Post-genomics and bio-informatics
Sutton Bonington Proteomics labsDivision of Animal Sciences – South lab
Susan Liddell, Ken Davies
• Supports proteomics studies & collaborative projects– gel electrophoresis (mainly 2D)– protein identification via tandem MS
• Wide variety of types of projects and organisms – including some species with unreported genome sequences
human cow horse fungi bacteria archaea plants
Dr Ken Davis
Overview
• what is proteomics?• why study the proteome• proteomic strategies
– the 2D gel standard workflow– high throughput LC-MSMS
• draft human proteome• challenges
Biochem. J. (2012) 444, 169–181
Proteomics: an explosive growth in interest & expectations
Same search on
11/2012 36,547 entries
12/2013 44,947 entries
11/2014 51,150 entries
The NCBI database PubMed was searched using the term
“proteomics”
The increase in proteomics publications
Driving forces for proteomics
‒ Technical advances in 2D gel electrophoresis
IPG strips, multigel runners, buffer components
‒ Enormous advances in MS instrumentation
speed, sensitivity, resolution, soft ionisation
‒ Computer algorithms for searching databases with MS data in correlative based approaches to identify proteins
‒ Nucleotide databases with complete genome sequences
ProteomeThe term “proteome” was coined by a PhD student, Marc Wilkins
“the entire PROTEin complement expressed by a genOME of a cell or organism”
Wasinger et al 1995 Electrophoresis: 16:1090
Proteomics“...the identification of all the proteins encoded in the human genome....”
including modification, quantification, localisation and functional analysis for every cell type
Human Proteome Organisation (www.hupo.org)
Proteomics
study of proteins and protein function usually on a genome wide scale
large scale or systematic characterization of the proteins of a cell, tissue, biofluid, or simple organism
Proteomics preceded genomics Human Protein Index N & L Anderson 1982
Aims of Proteomics
Global (unbiased) analysis of complex protein samples
Find changes in protein expression (potential biomarkers) in different biological situations (disease)
Development of diagnostic tools and therapeutic agents/drugs
Fundamental understanding of biological processes and mechanisms
proteins are the functional molecules in the cell and take part in virtually every cellular process
Why study proteins?
-catalysis-digestion, metabolism, DNA replication and repair, transcription, etc
-structure and movement-extracellular proteins are critical components of cartilage-muscle fibres are made up primarily of the proteins myosin & actin
-transporters-apolipoprotein A-1 carries lipids in the blood-haemoglobin carries oxygen
-proteins are the targets of the majority of drugs
-cell signalling, signal transduction, cell cycle, immunity
-the list goes on and on and on.....
Why analyse the proteome? Genome considerations
the genome provides only the blueprint – an inventory of the genes that could be expressed in a cell
• genomes are (largely) static
• proteomes vary enormously‒ cell types, developmental stage, environment, etc
• proteomes are highly dynamic, are constantly changing
• the proteome is a lot more complex than the genome, because most proteins are chemically modified and so have multiple forms
Images courtesy of en.wikipedia.org
Why analyse the proteome? Genome considerations
Functional Annotation of the Arabidopsis Genome Using Controlled VocabulariesPlant Physiology (2004) Vol.135, p745
Arabidopsis genome annotationfunctional characterisation
26% molecular function unknown
sequence alone does not reveal biological function
Why analyse the proteome? Genome considerations
• gene rearrangements‒ immunoglobulin heavy and light chains‒ T-cell receptor α and β chains
Why analyse the proteome? Transcript considerations
• RNA splicing
alternative splicing in a conserved family of ser/arg rich proteins in Arabidopsis generates up to 95 transcripts from only 15 genes
Why analyse the proteome? Transcript considerations
poor correlation between mRNA levels and
protein expression levels?– Comprehensive mass-spectrometry-based proteome quantification of
haploid versus diploid yeast. De Godoy et al 2008 Nature 455:1251
– “Overall correlation of mRNA and protein changes was poor....”
– “the relationship between mRNA and protein levels depends on the proteins investigated”
– Correlation between protein and mRNA abundance in yeastGygi et al 1999 MolCellBiol 19:1720
– A comparison of selected mRNA and protein abundances in human liverAnderson and Seilhamer 1997 Electrophoresis 18:533
Common covalent modifications of proteins affecting activity
Modification Donor moleculeExample of modified protein
Protein function
Phosphorylation ATPGlycogen phosphorylase
Glucose homeostasis; energy transduction
Acetylation Acetyl CoA HistonesDNA packing; transcription control
Myristoylation Myristoyl CoA Src Signal transduction
ADP-ribosylation NAD RNA polymerase Transcription
-Carboxylation HCO3- Thrombin Blood clotting
Ubiquitination Ubiquitin Cyclin Control of cell cycle
Biochemistry. Jeremy M Berg, John L Tymoczko, Lubert Stryer , Neil D Clarke
Why analyse the proteome? Transcript considerations
each transcript can give rise to several protein isoforms via post translational processing (>300 PTMs)
PROTEOMICS
• proteins are the main biological effector molecules, providing structures, enzyme activities, transport and more
• the next step from determining the genome is to find out the function of the gene products – the proteins
• analysis of protein products complements genomics & transcriptomics
“At the end of the day, proteins, not genes, are the business end of biology”
identify every protein present in a cell, tissue, biofluid or organism
Global Proteomics
Catalogue of proteins
most practical for simple organisms eg yeast, prokaryotes
• quantitativeprotein abundance
• qualitativepost translational modifications
phosphorylation, glycosylation
• subcellular compartments/organellesnuclei, plasma membrane, mitochondria
• functionalcomplexes of interacting proteins
Targeted (Sub)Proteomics
Mass Spectrometry
There are many Proteomic Approaches using many different technologies
Protein Chips
Protein arrays on slides
(protein spots, tissue sections)
Liquid Chromatography
Peptides/Proteins1D/2D
Labels/label free
Gels
Proteins1D/2D gels
stains/labels
Electrospray ionization (ESI)
John B Fenn
Matrix-assisted laser desorption/ionization (MALDI)
Koichi Tanaka
"for their development of soft desorption ionisation methods for mass spectrometric analyses of biological macromolecules"
The Nobel Prize in Chemistry 2002
Applications of mass spectrometry in protein analysis include
Protein identification peptide mass fingerprintingTandem MSde novo sequence
Recombinant protein evaluationconfirm identityengineered mutations, sequence changescleavages or other modificationsassess homogeneity
Identification of modificationsacetylationoxidationglycosylationphosphorylation
….anything that causes a change in mass….
Proteomic Workflow2D gel/MS
Protein separation
Mass spectrometric analysis
Database interrogation
Analysis and protein spot selection
Processing and digestion to peptides
Protein identification
Protein separation2-dimensional gel electrophoresis
1st dimension Separation by charge(isoelectric focussing) pH 3 pH 10
pI
2nd dimension Separation by molecular weight
(SDS-PAGE) kDa
2D gel electrophoresis equipment1st dimension IEF
various lengths 5 - 24 cm
wide range pH 3-11
narrow/zoom range pH 4-5
loading methods in-gel rehydration cup, paper bridge
2-D gel electrophoresis equipment 2nd dimension SDS-PAGE
various lengths
linear / gradient
reducing / non-reducing
Multi-gel runnersincrease reproducibilityincrease throughput
Protein detection and image capture
post-gel staining
colloidal coomassie bluesilverSYPRO ruby, Deep Purple, Flamingo
pre-gel sample labelling 35S-methionineCy3, Cy5, Cy2 (DiGE)
Pro-Q Diamond – phosphoproteinsPro-Q Emerald – glycoproteinsPro-Q Amber – transmembrane proteins (1D gels)
Example 2D gelE. coli cell extract
Soo Jin Saa100 µg, Silver stained, mini-formatpH 4-7 IPG strip, 12.5% PAGE
100
37
25
150
75
50
20
250
Resolution – how many proteins?
Depends on the separation lengths of gelsie size of IPG strip and 2nd dimension gel
Mini-gels (7 x 7cm) – a few hundred
Midi-gels (18 x 20 cm) – ~1-2,000
Large format (24 x 20 cm) – up to10,000
each spot is adifferent protein
spot intensity isproportional to the amount of protein
Comparison of gel stains
ColloidalCoomassie Blue
10-50 ng/mm2
SYPRO ruby
~ 1 ng/mm2
Silver
0.5 ng/mm2
Proteomic Workflow 2D gel/MS
Protein separation
Mass spectrometric analysis
Database interrogation
Analysis and protein spot selection
Processing and digestion to peptides
Protein identification
Analysis and spot selection
Image analysis software
PDQuest (BioRad)
DeCyder (GE Healthcare)
Same Spots (Nonlinear Dynamics)
Image capture
Spot detection
Spot matching across gel set
Statistical evaluations
Find differences in spot patterns (protein expression changes) between samples using image analysis software
separate proteins and compare different samples
2D gel electrophoresis separates proteins in two different dimensions – pH and size
differences in spot intensity = protein expression changes
normal cells diseased cells
Figure courtesy of Dr Rob Layfield, School of Life Sciences
normal cells diseased cells
You found some changes what are the actual proteins?
separate proteins and compare different samples
Proteomic Workflow 2D gel/MS
Protein separation
Analysis and protein spot selection
Processing and digestion to peptides
Mass spectrometric analysis
Database interrogation
Protein identification
Gel spot excision and processing
Pick individual spots
into 96-well
microtitre plates
Destain
Digest (trypsin)
Peptide extraction
Identify proteins using Mass Spectrometry
MALDI-ToF Q-ToF2 (plus capillary/nano flow HPLC)
gel versus off-gel
limitations of 2D gels
Some classes of proteins are difficult to obtain on 2D gels
basic / acidic proteinslarge / small proteinsmembrane proteins
Low throughput / difficult to automate
advantages of 2D gels
allow examination of modifications of intact proteinsrelatively low cost equipmenteasy to understand/interpret
Another approach : off-gel high throughput LC-MSMS
Mass Spectrometry
Many Proteomic Approaches using many different technologies
Protein Chips
Protein arrays on slides
(protein spots, tissue sections)
Liquid Chromatography
Peptides/Proteins1D/2D
Labels/label free
Gels
Proteins1D/2D gels
stains/labels
Proteomic Workflow high throughput LC-MSMS
(bottom up)
Digestion of complex protein sample
Mass spectrometric analysis
Database interrogation
Protein identification (large numbers)
Quantitationtagging, non-tagging approaches
Peptide separationHigh resolution HPLC
(often multidimensional)
Combines the HPLC separation of peptides with the detection and analytical power of tandem MS
LC-MSMS
Sample
MASSANALYSER
DETECTORESI
IONISATION SOURCE
MudPIT (Multi-dimensional protein identification technology)
Separate complex mixtures of peptides using
multi-dimensional HPLC
1st dimension – strong cation exchange
2nd dimension – reversed phase
Analyse using tandem mass spectrometry
LC-MSMSCombines the HPLC separation of peptides with the
detection and analytical power of tandem MS
Figure courtesy of proteabio.com
multidimensional chromatography LC-MSMS
From Vanderbilt University Medical Centre
high throughput LC-MSMS(shotgun)
2014 first draft maps of the human proteome published:
high resolution mass spectrometry
29 May 2014
Kim et al74 authors, led by Pandey lab in John’s Hopkins University, Baltimore, USA & Institute of Bioinformatics, Bangalore India)
•MS data on 30 normal human tissues and primary cell cultures•identified proteins encoded by 17,294 genes (~84% of predicted ORFs)•found >190 new proteins not predicted from genome sequence•Human Proteome Map http://www.humanproteomemap.org/
draft maps of the human proteome
Wilhelm et al22 authors, led by Kuster lab in Technische Universität München, Germany)
•combined already existing MS data (tissues, cell lines, body fluids, affinity purifications) with their own MS data (altogether 60 human tissues, 13 body fluids, and 147 cancer cell lines)•evidence for over 18,000 proteins (~92% of predicted proteins)•found >400 translated long, intergenic non-coding RNAs (lincRNAs)•ProteomicsDB https://www.proteomicsdb.org/
advantages of LC-MSMSvery high thoughput
identify large numbers of proteinsachieve more analytical runs/replicates
disadvantages of LC-MSMSdigestion increases the complexity of the sample
lose the connection between peptides and intact proteinscomplex analysis procedures/software need more bioinformatics assistance
Often used as complementary approaches, or in combination
gel versus off-gel
• gel based and high throughput off-gel experiments generate huge quantities of data in the form of long lists of proteins
• how can these data be placed in the biological context – what does the data tell you?
• this is a huge challenge
• the need is to establish what the proteins do, what other proteins they interact with and work out why they have changed - in order to obtain molecular insights into the process in question
What next?
Specialised software tools
‒ network analysis toolsIngenuity Systems IPA : pathway & networkanalysis of complex 'omics data
Figure from Rathbone, Liddell & Campbell (2013) Cellular Reprogramming 15:269
‒ text mining tools to help establish the function(s) of each proteinDAVID : functional annotation tools to help understand
biological meaning behind large list of proteins/genes
formulate a testable hypothesis to drive the research forward
Cytoscape : platform for visualizing complex networks and integrating ‘omics data
STRING : database and web resource for known and predicted protein-protein interactions
Challenges in proteomics : Complexity
Many different molecules that are expressed at different
‒ levels
‒ times
‒ places
‒ PTMs
Challenges in proteomics : Complexity
Range of sizes
•from ten to several hundred amino acids
•less than 10 kDa to 1 million kDa
Different forms
•post translational modifications hugely increases the number of different molecules
Chemical composition
•very different physicochemical characteristics
•when extracted from the cell they require different conditions to maintain solubility/stability etc
•large challenges in sample handling
Challenges in proteomics : Complexity
Diversity of sequences (number of genes/ORFs)
arabidopsis ~28,000 (the smallest plant genome)human > 22,000worm ~ 19,000yeast ~ 6,000E.coli ~ 5,000 (fairly typical for bacteria)
Huge number of proteins
estimates of the size of the proteome varies widely
for human cells it varies from 100,000 to 2 million!
Challenges in Proteomics
Dynamic range
Don’t see the lower abundance proteins in complex mixtures
Anderson NL, Anderson NG The human plasma proteome: history, character, and diagnostic prospects
Molecular and Cellular Proteomics 2002 1:845-867
Proteins measured clinically in plasma span > 10 orders of magnitude in abundance
1010 Really Is Wide Dynamic Range(Here on a linear scale)
2D gels only ~2-3 orders of proteins detected only the most abundant proteins
Mass spectrometers detection range of ~ 3 to 5 orders (recently maybe 6)
Dynamic range is a problem in proteomics since this one feature alone makes it impossible to analyse every molecular species present in a proteome
What is detectable?
Reduce the complexity and dynamic range
Fractionation techniques include remove abundant proteinsdifferent cellular compartmentsdifferential protein solubilitysequential chromatography (2D, 3D)affinity purification
How to overcome the dynamic range and detect proteins of lower abundance?
Fractionation : Remove abundant proteins12 proteins in plasma comprise ~ 96% of the protein mass
Figure courtesy of Beckman Coulter
Immunodepletion of 6 high abundance proteins from human serum
1 - crude serum
2 – flow through fractiondepleted of high
abundance proteins
3 - bound fraction
M 1 2 3
Figure courtesy Agilent Technologies
uses unbiased global screening technologies to analyse very complex samples
a proteome is dynamic and can vary depending on physical conditions, cell cycle, environment,
health/disease etc
a proteome is a “snapshot “of protein expression/modification by specific cells/tissues, under
particular conditions
identifies protein targets for further investigation after validation
Proteomics
Utube videoHow to run great 2D Gels (BioRad)
http://www.youtube.com/watch?v=7R_R6mbqvFk
Nat Rev Mol Cell Biol 2004 5(9):699
J Proteomics 2011 Sep 6;74(10):1842
Matrixscience MASCOT help pages http://www.matrixscience.com
MASCOT video tutorialshttp://view6.workcast.net/?
pak=3003276531895477&cpak=7053452047055213
References and other sources of information
Journal of Proteomics 2011 74:1829
References and sources of further information
WEBSITES
A Mass Spectrometry and Biotechnology Resourcehttp://www.ionsource.comMS Protein Identification Tutorial - especially De Novo Peptide Sequencing Tutorial ExPASy : Bioinformatics Resource Portalhttp://www.expasy.org/proteomicshttp://world-2dpage.expasy.org/swiss-2dpage/
Human Proteome Organisation (HUPO)http://www.hupo.org
The Fixing Proteomics Campaignhttp://www.fixingproteomics.org/ Guide to mass spectrometryhttp://masspec.scripps.edu/mshistory/whatisms_details.php#Basics
Nat Rev Mol Cell Biol 2010 11:789
Nature Biotech 2010 28:695
Annu Rev Biochem 2011 80:239
linkshttp://view6.workcast.net/?pak=3003276531895477&cpak=7053452047055213 MASCOT video tutorials
http://www.ionsource.com/ Mass Spectrometry and Biotechnology Resource – lots of useful info – tutorials on de novo sequencing etc
http://www.swissproteomicsociety.org/digest Swiss Proteomics Society. The “digest” provides a consolidated selection of articles published in all scientific
publications that are pertinent to proteomics – finds all the interesting and relevant papers for you!
http://proteome.nih.gov proteomics special interest group at NIH, includes archived videocasts of research seminars
http://ca.expasy.org/tools/ proteome informatics tools e.g. peptidemass predicted digestion fragment tool
http://www.bspr.org/ British Society for Proteome Research
http://www.bmss.org.uk/ British Mass Spectrometry society http://www.plasmaproteome.org/ The Plasma Proteome Institute in Washington D.C.
http://www.unimod.org/ Unimod : protein modifications for mass spectrometry
http://www.hupo.org/
http://www.spectroscopynow.com/coi/cda/home.cda?chId=0
http://www.abrf.org/index.cfm/group.show/Proteomics.34.htm