proteomics informatics (bmsc-ga 4437) course director david fenyö contact information...
TRANSCRIPT
Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
http://fenyolab.org/pi2015/
Proteomics Informatics – Learning Objectives
Be able analyze proteomics data sets and understand the limitations of the results.
Proteomics Informatics – Syllabus
Lecture 1 Overview of proteomics (February 3, 2014 TRB 717 4pm)
Lecture 2 Overview of mass spectrometry (February 10, 2014 TRB 717 4pm)
Lecture 3 Signal processing I: analysis of mass spectra (February 17, 2014 TRB 718 4pm)
Lecture 4 Protein identification I: searching protein sequence collections and significance testing (February 24, 2014 TRB 718 4pm)
Lecture 5 Protein quantitation I: overview (March 3, 2014 TRB 717 4pm)
Lecture 6 Databases, data repositories and standardization (March 10, 2014 TRB 717 4pm)
Lecture 7 Protein identification II: de novo sequencing (March 17, 2014 TRB 717 4pm)
Lecture 8 Protein quantitation II: multiple meaction monitoring (March 24, 2014 TRB 717 4pm)
Lecture 9 Proteogenomics (March 31, 2014 TRB 619 4pm)
Lecture 10 Protein characterization I: post-translational modifications (April 7, 2014 TRB 717 4pm)
Lecture 11 Signal processing II: image analysis (April 21, 2014 TRB 717 4pm)
Lecture 12 Protein characterization II: protein interactions (April 28, 2014 TRB 619 4pm)
Lecture 13 Data analysis and visualization (May 5, 2014 TRB 717 4pm)
Lecture 14 Molecular signatures (May 12, 2014 TRB 717 4pm)
Lecture 15 Presentations of projects (May 19, 2014 TRB 717 4pm)
Motivating Example: Protein Regulation
Geiger et al., “Proteomic changes resulting from gene copy number variations in cancer cells”, PLoS Genet. 2010 Sep 2;6(9). pii: e1001090.
Bioinformatics
Biological System
Samples
Measurements
Experimental Design
Raw Data
Information
Data Analysis
Mass Spectrometry Based Proteomics
Mass spectrometry
LysisFractionation
MS
Digestion
Identified and Quantified Proteins
Peak Finding Charge determination
De-isotopingIntegrating Peaks
Searching
Mass Analyzer 1
Frag-mentation
DetectorIon
SourceMass
Analyzer 2
b y
Overview of Mass spectrometry (Week 2)
Mass Analyzer 1
Frag-mentation
Detector
inte
ns
ity
mass/charge
Ion Source
Mass Analyzer 2
LC
inte
ns
ity
mass/chargeinte
ns
ity
mass/charge
inte
ns
ity
mass/chargeinte
ns
ity
mass/chargeinte
ns
ity
mass/charge
Time
inte
ns
ity
mass/chargeinte
ns
ity
mass/chargeinte
ns
ity
mass/charge
inte
ns
ity
mass/chargeinte
ns
ity
mass/chargeinte
ns
ity
mass/charge
inte
ns
ity
mass/chargeinte
ns
ity
mass/chargeinte
ns
ity
mass/charge
Overview of Mass spectrometry (Week 2)
Protein identification I: searching protein sequence collections and significance
testing (Week 4)
MS/MS
LysisFractionation
MS/MS
Digestion
SequenceDB
All FragmentMasses
Pick Protein
Compare, Score, Test Significance
Re
pe
at fo
r all p
rote
ins
Pick PeptideLC-MS
Re
pe
at fo
ra
ll pe
ptid
es
Protein quantitation I: Overview (Week 5)
Fractionation
Digestion
LC-MS
Lysis
MS
C ij
I ik
pij
Pr
pD
ijk
pPep
ik
pLC
ik
pMS
ik
pL
ij
ppppppCIMS
ik
LC
ik
Pep
ikj
D
ijkij
L
ijijkik
Pr
Sample iProtein jPeptide k
ppppppIC MS
ik
LC
ik
Pep
ik
D
ijkij
L
ijk
ikk
ij Pr
k
Protein quantitation I: Overview (Week 5)
Fractionation
Digestion
LC-MS
Lysis
MS MS
ppppppMS
ik
LC
ik
Pep
ik
D
ijkij
L
ijk
Pr
Assumption:
constant for all samples
IICC jjjj iiii mnmn//
Sample iProtein jPeptide k
Most proteins show very reproducible peptide patterns
Databases, data repositories and standardization (Week 6)
Query Spectrum
Best match In GPMDB
Secondbest match In GPMDB
Databases, data repositories and standardization (Week 6)
Protein identification II: de novo sequencing (Week 7)
m/z
% R
ela
tive
Ab
un
da
nce
100
0250 500 750 1000
[M+2H]2+
762
260 389 504
633
875
292405 534
9071020663 778 1080
1022
Mass Differences
1-letter code
3-letter code
Chemical formula
Monoisotopic
Average
A Ala C3H5ON 71.0371 71.0788
R Arg C6H12ON4 156.101 156.188
N Asn C4H6O2N2 114.043 114.104
D Asp C4H5O3N 115.027 115.089
C Cys C3H5ONS 103.009 103.139
E Glu C5H7O3N 129.043 129.116
Q Gln C5H8O2N2 128.059 128.131
G Gly C2H3ON 57.0215 57.0519
H His C6H7ON3 137.059 137.141
I Ile C6H11ON 113.084 113.159
L Leu C6H11ON 113.084 113.159
K Lys C6H12ON2 128.095 128.174
M Met C5H9ONS 131.04 131.193
F Phe C9H9ON 147.068 147.177
P Pro C5H7ON 97.0528 97.1167
S Ser C3H5O2N 87.032 87.0782
T Thr C4H7O2N 101.048 101.105
W Trp C11H10ON2 186.079 186.213
Y Tyr C9H9O2N 163.063 163.176
V Val C5H9ON 99.0684 99.1326
Amino acid masses
Sequences consistent
with spectrum
Protein quantitation II: Targeted (Week 8)
Fractionation
Digestion
LC-MS
Lysis
MS
Shotgun proteomics Targeted MS
1. Records M/Z
2. Selects peptides based on abundance and fragments MS/MS
3. Protein database search for peptide identification
Data Dependent Acquisition (DDA) Uses predefined set of peptides
1. Select precursor ion
MS
2. Precursor fragmentation
MS/MS
3. Use Precursor-Fragment pairs for identification
Proteogenomics (Week 9)
Tumor Specific
Protein DB
Non-Tumor Sample Genome sequencing Identify germline variants
Reference Human Database (Ensembl)
Genome sequencingRNA-SeqTumor Sample
Identify alternative splicing, somatic variants and
novel expression
TCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGATAGCTG
Exon 1 Exon 2 Exon 3
Exon 1
Variants
Alt. Splicing Novel Expression
Exon 1 Exon X Exon 2
Fusion Genes
Gene XExon 1
Gene XExon 2
Gene YExon 1
Gene YExon 2
Gene X Gene Y Kelly Ruggles
Protein characterization I: post-translational modifications (Week 10)
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsi
ty
Matching
Which assignment doesthe data support?
1, 1 or 2, or 1 and 2?
Signal processing II: image analysis (Week 11)
Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy of the cardiac connexome reveals plakophilin-2 inside the connexin43 plaque", Cardiovasc Res. 2013
AB
A
CD
Digestion
Mass spectrometry
EF
Identification
Protein Characterization II: protein interactions (Week 12)
Presentations of projects (Week 15)
Select a published data set that has been made public and reanalyze it.
Highlighted data sets: http://www.thegpm.org/
10 min presentations
Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
http://fenyolab.org/pi2015/