welcome to: case studies in quantitative proteomics · chromatin proteomics (jake jaffe)...
TRANSCRIPT
Welcome to:
Case Studies in Quantitative Proteomics
ASMS 2018
San Diego
1
Schedule
2
Time Day 1 Day 2
9:00-9:30Intro to Protein Quantification (Mike
MacCoss)
Case Study 4: Is my mass spectrometer
working? System Suitability. How we
assess this? (Pino)9:30-10:15
10:15-10:45 Coffee Break Coffee Break
10:45-11:15Introduction to Skyline (Brendan
MacLean)Intro to DIA (MacCoss)
11:15-12:00
Case Study 1: Label free targeted assay
development in a rodent model of heart
failure (Brendan MacLean)
Case Study 5: When PRM is not
enough -- hybrid assays (Jake Jaffe)
12:00-12:30Lunch Lunch
12:30-13:00
13:00-13:30
What happens when an experiment is
designed poorly: Case studies from failed
studies (Olga Vitek)
Case Study 6: When do you need a
PRM assay -- a case study in
Chromatin proteomics (Jake Jaffe)13:30-14:00
14:15-14:45 Coffee Break Coffee Break
14:45-15:15
Case Study 2: Group Comparsions in
Skyline (Brendan MacLean)
I have peak areas, what do I do now?
Intro to MSStats (Olga)14:15-15:45
15:45-16:15
Case Study 3: ABRF iPRG MS1
Quantification in Skyline (Brendan
MacLean)
Hands on Analysis of Proteomics
Datasets (Meena)16:15-16:45
What you should expect
• You should expect to be challenged!
– There should be material that you know but also there should
be material that we present that you don’t know.
• You should expect to learn the fundamentals of targeted
proteomics.
• You should learn that quantitative proteomics is not possible
without extensive quality control.
– We will teach you ways that have worked for us.
• You should learn how we have handled real-life problems.
• You should have a good introduction to the use of Skyline for
targeted proteomics.
• You may hear contradictory things from different instructors.
• The methods presented are not meant to be comprehensive. We
will present strategies we used to solve problems in our labs.
• You should expect to have fun!!
3
What we expect.
• We expect you to challenge us!
– You won’t hurt our feelings – we have thick skin. We
want feedback and also to hear that you want more.
• We expect different course participants to come
from different backgrounds.
– We expect to learn from you!
• We expect you to participate. The more you do,
the more we will all get out of the course.
– Don’t hesitate to ask a question. Feel free to interrupt
us.
4
Introduction to Quantitative Analysis of
Proteins
5
Selective Pressure is Provided on the Protein and not the Transcript
Schrimpf et al. PLOS Bio. 2009
Problems with clinical immunoassays
Single-plex
Poor standardization
Hook effect
Anti-reagent antibodies
Autoantibodies
Why Use Mass Spectrometry?
Hoofnagle and Wener, J Immunol Methods (2009)
Can We Multiplex Immunoassays?
Bead ID
Protein quantification
Luminex beads
Can We Multiplex Immunoassays?
vs.
Not easily.
Can We Multiplex Immunoassays?
Traditional Automated Immunoassays
Capture antibody selects sandwiches
detected by enzyme-conjugated secondary
Generally one at a time
Poor Standardization
Immunoassays start with epitope selection
Whole antigen, domain, or peptide
Different assays use different epitopes
Tumor antigens vary in post-translational modifications
Result: Variable results between assay platforms
Even with an international standard (e.g. BCR-457)
Hook Effect
Too much antigen can lead to negative result
Schiettecatte, Johan; Anckaert, Ellen; Smitz, Johan (2012-03-23). Interferences in Immunoassays. InTech.
https://en.wikipedia.org/wiki/Hook_effect
Autoantibody Interference
Autoantibodies can sequester antigen away from
detectable complexes
Falsely negative results with sandwich assays
Direct detection of analyte by mass spectrometry
Interlaboratory calibration
Improved specificity
No sandwiches formed
No hook effect
Saturation of signal – plateau of calibration curve
Digestion of all proteins
Destroy autoantibodies
Destroy anti-reagent antibodies
Mass SpectrometryA Potential Solution
Quantitative Analysis 101
Introduction Quantitative Analysis - Michael J. MacCoss 16
Amount (moles)
Measure
d I
nte
nsity (
Are
a)
LOD
LLOQ
ULOQ
Linear
Dynamic
Range
LOD: Limit of Detection
LLOQ: Lower Limit of Quantitation
ULOQ: Upper Limit of Quantitation
Quantitative Analysis 101: Using an Internal Std
Introduction Quantitative Analysis - Michael J. MacCoss 17
Amount (moles)
Inte
nsity R
atio (
Are
a R
atio)
LOD
LLOQ
ULOQ
Linear
Dynamic
Range
LOD: Limit of Detection
LLOQ: Lower Limit of Quantitation
ULOQ: Upper Limit of Quantitation
Use of calibration curve to convert signal to quantity
18
Analyte (m/z = M)
Where: ki= is the slope of the standard curve
Ab 0 Area from a blank
−=
k
AAn b
Inte
nsity M
Inte
nsity
timem/z
bAnkA +=
A
Known AmountM
easure
d S
ignal (A
rea)
𝐴 ∝ 𝑚𝑜𝑙𝑒𝑠
Use of calibration curve to convert signal to quantity
19
Sample (m/z = M) Spiked with
Internal Standard (m/z = M+i)
Where: k0/i= is the slope of the standard curve
Ri 0 Area ratio from a blank … only internal std
−=
i
i
k
RRn
/0
00
Inte
nsity
M
M+ i
Inte
nsity
timem/z
R0 =A0
Ai
ii RnkR += 0/00
n0
ni
Known AmountMeasure
d S
ignal (A
rea R
atio)
What does it mean if these are the two points?
20
Amount (moles)
Measure
d I
nte
nsity (
Are
a)
LOD
LLOQ
ULOQ
Linear
Dynamic
Range
LOD: Limit of Detection
LLOQ: Lower Limit of Quantitation
ULOQ: Upper Limit of Quantitation
Questions:
• Design an approach to confirm linearity and
LLOQ for peptides in a complex matrix?
– Assume you have < 50 peptides in your assay.
– Assume you have >1000 peptides in your assay?
• Design an approach to “calibrate” your method
– The purpose of the calibration is:
• To correct for differences between instruments and platforms
• Minimize between day batch effects
• Enable another lab to get similar results to your lab
21
Method to measure both LOQ and Linearity
• Possible Samples to Use as a Diluent Matrix
– Stable isotope labeled version of the matrix.
• 15N or SILAC labeled cells
– A diverged species
• For human plasma we use chicken plasma.
22
Diluent
Sample
Matrix
Pooled
Reference
Sample
Matrix
Reference Yeast BY4742 Diluted in 15N Yeast (S288c)
23
Fractional Dilution
0 0.2 0.4 0.6 0.8 1
Pe
ak a
rea
(1e
9)
0
0.5
1
1.5
2Protein SSA1 (2.69E+05 copies/cell)
Fractional Dilution
0 0.2 0.4 0.6 0.8P
eak
are
a (1
e7
)
0
1.0
1.2
0.6
0.8
Protein LSM2 (2.21E+03 copies/cell)
0.2
0.4
Mixing two samples of extreme conditions
• Assume that you have a healthy and a disease
cohort.
• You can take a pool of the healthy and a pool of
the disease to demonstrate that the
measurement is linear between the two
conditions
• Example
– 100% condition A
– 80% condition A, 20% condition B
– 50% condition A, 50% condition B
– 20% condition A, 80% condition B
– 100% condition B
24
Sample
Sample + Spike
Known SpikePre-Existing
Femtomoles Peptide
-6 -4 -2 0 2 4 6 8 10 12 14 16
0
20000
40000
60000
80000
100000
120000
AU
C (
Counts
)
Method of Standard Addition
0
2500
5000
7500
10000
12500
15000
Counts
Time
Sample
Sample + Spike
Can’t assess Limit of Quantification. Can only
assess whether the sample us above the LOQ
OTHER THINGS TO CONSIDER
26
We are not doing protein quantitation we are
doing peptide quantitation!
27
P
PP
P
Level 1
Level 2
Level 3
Level 4
Level 5
Peptide Intensity Varies Over
Course of Digestion
Agger, ClinChem, 2010
Do you know the enrichment of the standard?
29
Peptide: DIDIEYHQNK N = 15
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
100 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
98 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
95 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
90 ape 15N
1:1
Do you know the enrichment of the standard?
Introduction Quantitative Analysis - Michael J. MacCoss 30
Peptide: DIDIEYHQNK N = 15
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
100 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
98 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
95 ape 15N
1:1
1270 1275 1280 1285 1290 1295
Rela
tive A
bundance
m/z
90 ape 15N
1:1
Effect of 13C Labeling on Quantitative Accuracy
Introduction Quantitative Analysis - Michael J. MacCoss 31
628 629 630 631 632 633 634 635 6360
20
40
60
80
100
Rela
tive A
bun
dance
m/z
YAGILDCICATFK
9 x 13C at >99.9 APE
Effect of 13C Labeling on Quantitative Accuracy
Introduction Quantitative Analysis - Michael J. MacCoss 32
YAGILDCICATFK
9 x 13C at >99.9 APE
628 629 630 631 632 633 634 635 6360
20
40
60
80
100
Rela
tive A
bun
dance
m/z
Effect of 98 ape Labeling on 2H8-ICAT Accuracy
Introduction Quantitative Analysis - Michael J. MacCoss 33
814 815 816 817 818 819 820 821 8220
20
40
60
80
100
Re
lative
Ab
un
da
nce
m/z
FGTWQCICATLMR
0.984
1885 1886 1887 1888 1889 1890 1891 1892 1893 18940
20
40
60
80
100
Rela
tive A
bundance
m/z
SYIEGTAVSQADFGLGS
GHCICATQALDPWVTVFK
0.968
1:1 Mole Ratio 1:1 Mole Ratio
34
Challenges with Discovery Proteomics
and Possible Alternatives
µLC
Ab
un
da
nce
Time
Shotgun Proteomics
Protein
Mixture
proteolysis
Peptide
Mixture
Ab
un
da
nc
e
m/z
MS
MS/MS1
Ab
un
da
nce
m/z
1
2
Ab
un
da
nce
m/z
2
3
Ab
un
da
nc
e
m/z
3
These Mixtures are Complicated!
Time (Min)
m/z
MS ScanMS/MS MS/MS MS/MS
These Mixtures are Complicated!
Time (Min)
m/z
Acquisition methods in shotgun LC-MS/MS
Data Dependent Acquisition (DDA)
Liquid chromatography
MS/MS analysisIsolation &
fragmentationMS analysis
Sampling Every Precursor Using DDA
• To get every precursor using DDA, we would have to collect 121 MS/MS spectra on average per chromatographic peak
200
150
50
100
0Scan index
300
250
Number of precursors from m/z 400 to 1,000
Nu
mb
er
of
pre
curs
ors
Average: 120.58
Data Dependent Acquisition will Always
Undersample
Data Dependent Acquisition will Always
Under Sample
Red: Detected Persistent Peptide Isotope Distributions
Blue: MS/MS Spectra
Green: Peptide Identifications
DDA always misses and is least
reproducible on the low abundant signals
Imagine Visiting Kyoto and Visiting the Tallest Buildings First
43
Analogy Nate Yates
You would miss …
44
If you wouldn’t learn much about Kyoto by going to the tallest
structures why would we use this approach to understand protein
mixtures???
So what do we do?
45
Build on Other People’s Knowledge
So what do we do?
46
So what do we do?
47
So what do we do?
48
C. elegans as an In Vivo Model System for Proteomics
The only multicellular
organism with a genome
sequenced to completeness
Complete Lineage for all 959 Somatic Cells
Insulin
ReceptorDAF-2
IRS1-4 IST-1
pi 3-kinase AGE-1
PDK1 PDK-1
AKT/PKB AKT-1/AKT-2
FKHRL1 DAF-16FKHR AFX
Humans C. elegans
Soluble Proteins Insoluble Proteins
Density Centrifugation
10 Fractions
Reversed
Phase
10 Fractions
(NH4)2SO4
Cuts
8 Fractions
10 MudPITs 8 MudPITs 10 MudPITs
>6,000,000 MS/MS Spectra, 67,047 Unique Peptide
Identifications, and 6,779 Unique Protein Identifications
0 Proteins Identified that are
Known to be Involved in the Insulin
/ IGF-1 Signaling Pathway
Fractionation to Improve the Coverage of Proteins Involved in Insulin / IGF-1 Signaling
200 400 600 800 10000
20
40
60
80
100
Re
lative
Ab
un
da
nce
m/z
521.0 521.5 522.0 522.5 523.0 523.5 524.00
20
40
60
80
100
Re
lative
Ab
un
da
nce
m/z
Ion
Source
CID
521.7 757.6
Q1 Q3
32 33 34 35 36 37 38 39 40
0
2x104
4x104
6x104
8x104
1x105
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
6x103
9x103
1x104
2x104
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
6x103
9x103
1x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
4x103
8x103
1x104
2x104
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
5x103
8x103
1x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x104
6x104
9x104
1x105
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
2x103
4x103
6x103
8x103
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
1x104
2x104
3x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
1x104
2x104
3x104
Counts
Time (min)
Precursor Ion > Product Ion
TGASEAVPSEGK > GK
TGSAEAVPSEGK > EGK
TGSAEAVPSEGK > SEGK
TGSAEAVPSEGK > PSEGK
TGSAEAVPSEGK > VPSEGK
TGSAEAVPSEGK > AVPSEGK
TGSAEAVPSEGK > EAVPSEGK
TGSAEAVPSEGK > AEAVPSEGK
TGSAEAVPSEGK > SAEAVPSEGK
566.78 > 204.13
566.78 > 333.18
566.78 > 420.21
566.78 > 517.26
566.78 > 616.33
566.78 > 687.37
566.78 > 816.41
566.78 > 903.44
566.78 > 1031.5
y2
y3
y4
y5
y6
y7
y8
y9
y10
SRM Chromatograms
Acquisition methods in shotgun LC-MS/MS
Data Dependent Acquisition (DDA)
Selected/Parallel Reaction Monitoring (S/PRM)
MS/MS analysisIsolation &
fragmentationMS analysis
Liquid chromatography
20 21 22 23 24 250
20
40
60
80
100
Re
lative
Ab
un
da
nce
Time (min)
Synth
etic P
eptide S
tandard
s
20 21 22 23 24 250
20
40
60
80
100
Re
lative
Ab
un
da
nce
Time (min)
Worm
Hom
ogenate
25 µ
g
20 21 22 23 24 250
20
40
60
80
100
Re
lative
Ab
un
da
nce
Time (min)
20 21 22 23 24 250
20
40
60
80
100
Re
lative
Ab
un
da
nce
Time (min)
75 attomoleIDATTHIGGVQIK
7.5 femtomoleQEDVVIEGWLHK
IDATTHIGGVQIK
DAF-16QEDVVIEGWLHK
AKT-1
0 20 40 60 80 100
0.0
5.0x106
1.0x107
1.5x107
2.0x107
2.5x107
3.0x107
3.5x107
4.0x107
Peak A
rea
Moles Injected (fmol)
Dilution of the Yeast Protein PGI1 in Undepleted Human Plasma
Peptide: ILFDYSK
R = 0.99992
Dilution of the Yeast Protein PGI1 in Undepleted Human Plasma
Peptide: ILFDYSK
R = 0.99992
0 5 10 15
0
1x106
2x106
3x106
4x106
5x106
Peak A
rea
Moles Injected (fmol)
Dilution of the Yeast Protein PGI1 in Undepleted Human Plasma
Peptide: ILFDYSK
R = 0.99992
0.0 0.2 0.4 0.6 0.8 1.0
0
1x105
2x105
3x105
4x105P
eak A
rea
Moles Injected (fmol)
12 Attomoles of ILFDYSK on Column in 10 µg of Unfractionated Human Plasma
27 28 29 30 31 32 330.0
2.0x103
4.0x103
6.0x103
8.0x103
1.0x104
1.2x104
1.4x104
Inte
nsity
Retention Time (min)
*
SRM of Ceruplasmin – ETFTYEWTVPK
30 31 32 33 34 35 36 37 38 39 40
0.0
5.0x105
1.0x106
1.5x106
2.0x106
2.5x106
3.0x106
3.5x106
Counts
Time (min)
SRM Transitions
700.84 > 244.17
700.84 > 759.4
700.84 > 922.47
SRM Triggered
Product Ion Spectrum
200 400 600 800 1000 12000
20
40
60
80
100
Rela
tive A
bu
nd
an
ce
Time (min)
Ceruplasmin – ETFTYEWTVPK
1170.9
1023.7
922.5759.31630.3443.1
119.8
243.9
461.1343.0 y9y8
y7y6y5y4
y3
y2
32 33 34 35 36 37 38 39 40
0
2x105
4x105
6x105
8x105
1x106
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
2x104
4x104
6x104
8x104
1x105
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
6x103
9x103
1x104
2x104
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
6x103
9x103
1x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
4x103
8x103
1x104
2x104
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x103
5x103
8x103
1x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
3x104
6x104
9x104
1x105
Co
un
ts
Time (min)
32 33 34 35 36 37 38 39 40
0
2x103
4x103
6x103
8x103
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
1x104
2x104
3x104
Counts
Time (min)
32 33 34 35 36 37 38 39 40
0
1x104
2x104
3x104
Counts
Time (min)
Chromogranin A
TGASEAVPSEGK
566.78 > 147.11
566.78 > 204.13
566.78 > 333.18
566.78 > 420.21
566.78 > 517.26
566.78 > 616.33
566.78 > 687.37
566.78 > 816.41
566.78 > 903.44
566.78 > 1031.5
32 33 34 35 36 37 38 39 40
0.0
2.0x104
4.0x104
6.0x104
8.0x104
1.0x105
1.2x105
Counts
Time (min)
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
SRM Triggered
Product Ion Spectrum
Chromogranin A is ~10 ng/mL in plasma
Anderson et al. J. Physiology 2005
200 400 600 800 10000
20
40
60
80
100
Re
lative
Ab
un
da
nce
Time (min)
Chromogranin A
TGASEAVPSEGK
566.78 > 147.11
566.78 > 204.13
566.78 > 333.18
566.78 > 420.21
566.78 > 517.26
566.78 > 616.33
566.78 > 687.37
566.78 > 816.41
566.78 > 903.44
566.78 > 1031.5
y1
y2
y3
y4
y5
y6
y7
y8
y9
y10
85.9
146.9
218.0
401.8
494.9
574.9639.4
741.2854.3
967.42
696.1
331.1
y1
y2 y3 y4 y5 y6
y7 y8 y9
y10
Should we be putting in all this effort toward measuring reliable peptide peak areas?
Spike-in Validation Experiment
Human4µg 0.1µg
E. coli
Human4µg
0.8µgE. coli
4 replicates
con
tro
ls
8x change
#Proteins
Detected
#E. coli
Proteins
#Human
Proteins
#E. coli
peptides
#Human
peptides
Spectra /
PSMs at q-
value < 0.01
2073 507 1566 1633 2346 172241; 29163 (16.9%)
Human E. coli
Log2 Protein Fold Change
Protein ID Summary
Relative Quantitation
4 replicates
MS1 Filtering compared with Spectral Counting
4 orders of magnitudeof dynamic range
2.5 orders of magnitudeof dynamic range
MS1 Filtering and Spectral Counting Fold-Change Distributions
Crawdad – AUC quantitation
Raw Ratio: n1 / n2
NSAF Ratio: ratio of SPC normalized by protein length
f = 0.5
𝑅𝑆𝐶 = log2𝑛2 + 𝑓
𝑛1 + 𝑓+ log2
𝑡1 − 𝑛1 + 𝑓
𝑡2 − 𝑛2 + 𝑓
E. Coli Proteins(8-fold spike-in
difference)
Limited to proteins where SPC for both1x, 8x sets are > 0
Measured Protein Fold-Change Ratios: MS1 Filtering vs. Spectral Counting
E. coli protein
Human protein
E. coli protein
Human protein
8x change
1x change
8x change
1x change
Measured Protein Fold-Change Ratios: MS1 Filtering vs. Spectral Counting
E. coli protein
Human protein
E. coli protein
Human protein
8x change
1x change
8x change
1x change
Limited to Proteins with >= 10 MS/MS Spectra
Summary
• Immunoassays are an imperfect solution to the protein quantitation
problem.
• Add the internal standard as early as possible to account for sample
preparation errors
• Systematic errors do occur and need to be corrected.
• High sequence coverage improves the comprehensiveness of shotgun
proteomics data
• Know the isotope enrichment of the internal standard.
– Can be measured by a number of different techniques
• 13C labeled internal standards aren’t perfect and require special
considerations to get accurate isotopomer ratios
• Think about how you are going to calculate the significance before you
do the study. Get help early on if you are not sure.
• Is your sample size and experimental design going to answer your
questions?
• Don’t tolerate experiments with an N of 1!
70
Schedule
71
Time Day 1 Day 29:00-9:30
Intro to Protein Quantification (Mike
MacCoss)
Intro to Data Independent
Acquisition (Mike MacCoss)9:30-10:00
10:00-10:30 Principles of Study Design:
Possibly examples of failed
studies? (Olga Vitek)10:30-11:00
Introduction to Skyline (Brendan
MacLean)
11:00-11:30
Case Study 1a: Targeted Method
Refinement (Brendan MacLean)
Case Study 3b: Using Sentinel
Profiles for Drug Characterization:
Clustering and connectivity (Jake
Jaffe)11:30-12:00
12:00-12:30Lunch Lunch
12:30-13:00
13:00-13:30 Case Study 1b: Group Comparsions
in Skyline (Brendan MacLean)Intro to MSstats (Olga Vitek)
13:30-14:00
14:00-14:30 Case Study 2: MS1 Quantification in
Skyline (Brendan MacLean)
Statistical Analysis of Proteomics
Datasets (Olga Vitek)14:30-15:00
15:00-15:30Case Study 3a: Intro to Sentinel
Assay Concept, Development of
P100 Discovery Data Set (Jake Jaffe)
System Suitability and Quality
Control. How we assess this,
AutoQC/Panorama/MSStatsQC
(Lindsay/Eralp)15:30-16:00