dnagenomics rnagenomics/transcriptomics proteinproteomics metabolitesmetabolomics
DESCRIPTION
DNAGenomics RNAGenomics/Transcriptomics ProteinProteomics MetabolitesMetabolomics. The Central Dogma-omics. Protein Machines. The polyAdenylation Machinery. The Proteosome. Key Concept: Biochemical functions are carried out by multi-protein machines. - PowerPoint PPT PresentationTRANSCRIPT
DNA Genomics
RNA Genomics/Transcriptomics
Protein Proteomics
Metabolites Metabolomics
The Central Dogma-omics
Protein Machines
Key Concept: Biochemical functions are carried out by multi-protein machines
The polyAdenylation MachineryThe Proteosome
Key Concept: A Protein Function can be inferred by it’s binding partnersKey Concept: Knowledge of a Machine’s components is required to understand how it works and how it is regulated
Key Concept: Highly Clustered areas typically serve the same biological function.
Protein Machines Interaction with each other inHigher order Networks
Key Concept: Complex phenotypes can be understood in a network context
Understanding the Network May Give Insights intoEmergent Behaviors
-Homeostasis
-Robustness
-Periodicity
-Morphogenesis
-Tumorigenesis
Proteins Are Organized in a “Small World” Network
Key Concept: The proteome is HIGHLY Networked
The Small World Hypothesis: Six Degrees of Separation
Stanley Milgram study in 1967
-put ads in newspapers in Nebraskaand Kansas asking for volunteers for anexperiment. The volunteers were asked to contact a divinity student inBoston by going through people thatthey new on a first name basis who would then contact their friends andso on.
-the number of people (degrees) be-tween the volunteers and the targetranged between 2 and 10 with themean being 6.
Properties of Small World Networks:
-highly clustered: “my friends arealso friends”
-most nodes are not connected: “mostpeople are strangers”
-presence of hubs (nodes with a lot ofconnections): “Facebook Whales”
-can find a short path between any twonodes. “Two strangers meet and realize they know some of the sameperson” This path is often referred toas the degree of separation
-network should be resistant to pertub-ation: “Life goes on”
Clustered vs Non-Clustered
Number of Links (k)
Num
ber o
f nod
es w
ith k
link
s
Distribution of Connections
80/20 Law
Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914
8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918
Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433
Median Degree of Separation : 2.38
Shortest and longest Pathways
Is S. cerevisae Robust???-Environmentally Robust
-Robust to temperature (4-40 C)-Robust to Nutrient Sources-Robust to Starvation-Robust to Osmolarity (0-1 M NaCl)
-Is it Robust to Genetic Perturbation (mutation)???-S. cerevisiae Genome Deletion Project has deleted 95%
of all S. cereviae genes-18.7% of genes are essential
-in a typical small world network you can lose ~20% of all nodes before the network crashes.
Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914
8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918
Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433
Median Degree of Separation : 2.38
Is there any biology behind the network hypothesis?
Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.
Ten best Centers 1. CLU1 1.8432. CDC33 1.8673. TIF2 1.8754. MDH1 1.8985. SRP1 1.9126. YBL004W 1.9147. RPT3 1.914
8. HAS1 1.9149. YGR090W 1.91710. PFK1 1.918
Ten Worst Centers CAC2 3.803PSR1 3.838RAM2 3.840RAM1 3.840ORC2 3.863UBA3 3.902MAK10 3.975YNL056W 4.003YNR046W 4.089VPS4 4.433
Median Degree of Separation #: 2.38
Is there any biology behind the network hypothesis?
Key Concept: Connectivity and essentiality are correlated.
Essential ORF deletions are only available as heterozygous diploids, while non-essential ORF deletions are available as haploids, homozygous diploids and heterozygous diploids.
Evolutionary Effects of Connectedness
-Connected genes are non randomly distributed in the genome-Connected genes are less likely to undergo duplication -Connected genes are less likely to have close homologs -Connected genes are less likely to have introns
Evolutionary Effects of Connectedness
Is S. cerevisae Robust???-Environmentally Robust
-Robust to temperature (4-40 C)-Robust to Nutrient Sources-Robust to Starvation-Robust to Osmolarity (0-1 M NaCl)
-Is it Robust to Genetic Perturbation (mutation)???-S. cerevisiae Genome Deletion Project has deleted
95% of all S. cereviae genes-18.7% of genes are essential Is Cancer a Robust Network
-Environmentally Robust-It Lives under a constant state of genomic stress
Summary
-Proteins are organized in functional units (machines)-these machines do virtually all the work in the cell-understanding the components of a machine is critical for functionally annotating the genome-understanding the components of a machine is critical for determining how a machine is regulated-the effects of mutation are great at this level
-Protein Machines are organized into higher order Networks-the Network architecture has left its imprint on evolution-the Network is likely to be rewired under pathological pathological conditions
-especially in the case of cancer-understanding the Network is important for understanding the complex behavior of the system
Key Concept: High Throughput mapping of protein:protein interactions will provide important insights into human biology
Understanding the Network Requires a lot of Information
-Direction of Information
-Sign
-Magnitude
-Timing
Understanding the Network Requires a lot of Information
-Direction of Information
-Sign
-Magnitude
-Timing
Understanding the Network Requires a lot of Information
-Direction of Information
-Sign
-Magnitude
-Timing
Understanding the Network Requires a lot of Information
-Direction of Information
-Sign
-Magnitude
-Timing
Understanding the Network Requires a lot of Information
-Direction of Information
-Sign
-Magnitude
-Timing
Approaches for Mapping Protein:Protein Interactions-Mapping by Inference:
-if two proteins interact in one organism than they interact in other organisms.
-can be extended to domains/motifs as well-if two proteins are coregulated on microarrays they are likely
to interact-Direct Mapping:
-In vitro binding experiment-Genetic Screen/Trap
-Yeast 2-hybrid assay-Affinity Co-purifications
-IP:Western blot-IP:Mass Spectrometry
Interactomics by Genetic Screens
Key Concept: Genetic Complementation allows the identification of direct (binary) interactions.
Uetz et al 2001
Interactomics by Genetic Screens
Key Concept: No matter how good something is…there are always problems.
Advantages of Genetic Complementation:-can do genome scale screening-quick-cheap-adaptable-works best when the screen is based on selection
Problems of Genetic Complementation:-sensitive to dynamic range-protein interaction may be incompatible with the complementation scheme-can not perturb the system-more false positives than true positives
Affinity Governs the formation of Protein Complexes
Affinity is Determined by the shapesof the proteins and how well they fittogether.
-hydrophobic interactions-ionic interactions-hydrogen bonding
Affinity is usually expressed as Kd which is the [ ] that results in equivalent [ ] and [ ]. Implicitly,there is usually a mixture of freeand complexed components and thisratio is [] dependent.
+
Kd = [ ] x [ ][ ]
Affinity Governs the formation of Protein Complexes
+
A weak interaction may only form if theconcentration is high enough.
+
Interactomics by Co-purification
Key Concept: Interacting proteins will co-purify
Tap Tagging: Rigaut et al. Nat. Biotech. 1999.
Interactomics by Co-purification
Advantages of Co-purification:-proteins isolated from their native source-the system can be perturbed
Problems of Co-purification:-sensitive to dynamic range-real interactions may be lost during purification-can be difficult to purify the target protein-no “amplification”
-need a way to identify the co-purifying proteins
antibody bead
IP
antibody bead
trypsin digest direct
antibody bead
IP
Key Concept: Cutting out steps is one of the hallmarks of high through put approaches. This increases the through put and usually also increases the sensitivity.
antibody bead
trypsin digest directly from beads
antibody bead
IP
Affinity Purification - coIP
012345678
#M
atch
ed P
eptid
es
Heavy ChainLight Chain
Native Antibody is Resistant to Trypsin
01020304050
Additive
#M
atch
ed P
eptid
es
Heavy ChainLight Chain
Reduced/Denatured Antibody is Sensitive to Trypsin
antibody bead
IP
Key Concept: Complex mixtures can not be manually interpreted. The average protein generates ~50 proteolytic fragments…..so you will have
1000s and 1000s to interpret.
NS
NS
NS
NS
NS
NS
NS
NS
NS
NS
Sources of “Non-Specific Binding”
-Not enough washing. -Biofluids have a high dynamic range so
you must wash away the super abundant stuff to see the less concentrated proteins
-Proteins that stick to the beads
-Proteins that stick to the antibodies on the beads
-Proteins that stick to the wall of the tube
-Proteins that stick to your complex of interest
-Proteins that are real binders but are biologically irrelevant
“Nonspecific Binding” is Reproducible
Are All Protein Complexes Biologically Relevant?
An interaction will be selected forif it is beneficial.
An interaction will be selected againstif it is detrimental.
What happens if the interaction is neither beneficial nor detrimental?
What would be the cost of allowing onlybeneficial interactions?
+
RT: 0.00 - 89.99
0 10 20 30 40 50 60 70 80Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
34.84
49.29
35.8827.93 39.65
27.75
49.8840.00 69.7818.48
54.7625.49
69.3713.5855.30 70.21
7.73 55.81
60.68 71.336.97 89.20
NL:2.16E7Base Peak F: MS 082405OFRep78
082405OFRep78 #6779 RT: 36.02 AV: 1 NL: 7.56E6T: ITMS + p ESI Full ms [ 400.00-1700.00]
400 600 800 1000 1200 1400 1600m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
652.45
943.27
627.82 978.09504.73 696.09 787.00 1414.451154.45986.45 1282.55 1515.45 1637.09
082405OFRep78 #6781 RT: 36.03 AV: 1 NL: 8.81E4T: ITMS + c ESI d Full ms2 [email protected] [ 245.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
924.57
545.38
767.43
527.33638.36
881.60
397.25
329.08
1313.061105.82 1474.14
Protein ID by Mass Spectrometry
MultiDimensional Chromatography (MuDPIT)
RT: 0.00 - 90.00
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
33.49 48.9535.14
38.5231.76
58.64
41.58
59.16
30.1029.59
59.51
26.3960.1253.42
62.2525.6774.2462.7125.16
77.1018.8811.40 81.7410.13
NL:1.96E7Base Peak F: MS 082905MPMINSULINSN
RT: 0.00 - 90.03
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
11.22
10.60
11.45
11.96
44.8312.4536.9935.53
7.32 55.5835.25 50.8833.746.25
61.16
31.64 62.06 73.9429.6525.51 75.04 87.30
NL:1.05E7Base Peak F: MS 090605OF293Tctl
RT: 0.00 - 90.01
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
54.06
53.68
54.35
11.37 44.8043.54
10.9648.72
36.15 57.5633.2310.57 12.04
5.57 64.7431.3029.87 74.7526.82 75.82
NL:2.64E7Base Peak F: MS 090705OFrep78
RT: 0.00 - 90.05
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
49.28
58.71
38.52
31.77
44.49
59.4030.94
59.62
60.20
81.3526.58
60.8955.32
61.61 80.7525.19 81.6568.34 71.94 83.9918.258.54
NL:2.19E7Base Peak F: MS 082705MPMcntrlsn
RT: 0.00 - 89.99
0 10 20 30 40 50 60 70 80Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
34.84
49.29
35.8827.93 39.65
27.75
49.8840.00 69.7818.48
54.7625.49
69.3713.5855.30 70.21
7.73 55.81
60.68 71.336.97 89.20
NL:2.16E7Base Peak F: MS 082405OFRep78
RT: 0.00 - 90.02
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
33.82
70.0530.92 38.44
52.7630.42
43.34
28.2452.57
53.5745.0169.57
70.77
25.6124.71 59.53
70.9960.46
23.8271.4813.77 61.2111.9673.97 81.47
NL:9.56E6Base Peak F: MS 082405OF293Tctl
10-100 Proteins(6 hours)
100-300 Proteins(2 hours)
1000-6000 Proteins(10 hours)
Comparison of Three Analysis Techniqueson Lysates
RT: 0.00 - 90.00
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
33.49 48.9535.14
38.5231.76
58.64
41.58
59.16
30.1029.59
59.51
26.3960.1253.42
62.2525.6774.2462.7125.16
77.1018.8811.40 81.7410.13
NL:1.96E7Base Peak F: MS 082905MPMINSULINSN
RT: 0.00 - 90.03
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
11.22
10.60
11.45
11.96
44.8312.4536.9935.53
7.32 55.5835.25 50.8833.746.25
61.16
31.64 62.06 73.9429.6525.51 75.04 87.30
NL:1.05E7Base Peak F: MS 090605OF293Tctl
RT: 0.00 - 90.01
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
54.06
53.68
54.35
11.37 44.8043.54
10.9648.72
36.15 57.5633.2310.57 12.04
5.57 64.7431.3029.87 74.7526.82 75.82
NL:2.64E7Base Peak F: MS 090705OFrep78
RT: 0.00 - 90.05
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
49.28
58.71
38.52
31.77
44.49
59.4030.94
59.62
60.20
81.3526.58
60.8955.32
61.61 80.7525.19 81.6568.34 71.94 83.9918.258.54
NL:2.19E7Base Peak F: MS 082705MPMcntrlsn
RT: 0.00 - 89.99
0 10 20 30 40 50 60 70 80Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
34.84
49.29
35.8827.93 39.65
27.75
49.8840.00 69.7818.48
54.7625.49
69.3713.5855.30 70.21
7.73 55.81
60.68 71.336.97 89.20
NL:2.16E7Base Peak F: MS 082405OFRep78
RT: 0.00 - 90.02
0 10 20 30 40 50 60 70 80 90Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
33.82
70.0530.92 38.44
52.7630.42
43.34
28.2452.57
53.5745.0169.57
70.77
25.6124.71 59.53
70.9960.46
23.8271.4813.77 61.2111.9673.97 81.47
NL:9.56E6Base Peak F: MS 082405OF293Tctl
53 Proteins(6 hours)
76 Proteins(2 hours)
82 Proteins(10 hours)
Comparison of Three Analysis Techniqueson IPs
RT: 0.00 - 89.99
0 10 20 30 40 50 60 70 80Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e Ab
unda
nce
34.84
49.29
35.8827.93 39.65
27.75
49.8840.00 69.7818.48
54.7625.49
69.3713.5855.30 70.21
7.73 55.81
60.68 71.336.97 89.20
NL:2.16E7Base Peak F: MS 082405OFRep78
082405OFRep78 #6779 RT: 36.02 AV: 1 NL: 7.56E6T: ITMS + p ESI Full ms [ 400.00-1700.00]
400 600 800 1000 1200 1400 1600m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
652.45
943.27
627.82 978.09504.73 696.09 787.00 1414.451154.45986.45 1282.55 1515.45 1637.09
082405OFRep78 #6781 RT: 36.03 AV: 1 NL: 8.81E4T: ITMS + c ESI d Full ms2 [email protected] [ 245.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
924.57
545.38
767.43
527.33638.36
881.60
397.25
329.08
1313.061105.82 1474.14
Protein ID by Mass Spectrometry
~10,000 MS/MS per hour
Key Concept: LC-MS/MS workflows can not be manually interpreted
acquiredspectrum
theoreticalspectrum(y/b ions)
100%
0%
1
0
x
Spectra matched
matchedpeaks
(y/b ions)
100%
0%
*
0
/n
i i
i
y bScore I P
spectrumintensities
predicted?(1,0)
Compute a Correlation Score
*
0
/n
i i
i
y bScore I P
spectrumintensities
predicted?(1,0)
The Truth about Spectral Matching
-Spectral matching produces an “answer” for every spectra, even those that are artifacts.-Experimental spectra always deviate from theoretical spectra.-A high correlation score is not a guarantee that it is correct.-Peptide must be in the database in order to be found.
Peptide ID by Mass Spectrometry
Peptide IDs can be clustered into Protein IDs
Mapping Peptides to Proteins is NOT easy!
Single Proteins to Protein Lists
Single Proteins to Protein Lists
How do you know which matches to trust???
*
0
/n
i i
i
y bScore I P
spectrumintensities
predicted?(1,0)
The Truth about Spectral Matching
-Spectral matching produces an “answer” for every spectra, even those that are clearly artifacts.-Experimental spectra always deviate from theoretical spectra.-A high correlation score is not a guarantee that it is correct.-Peptide must be in the database in order to be found.
-An e value is easily calculated using the ~11,000 “incorrect” peptides.-The false discovery rate is easily calculated using a “decoy” database
Key Concept: Statistics are required for the proper interpretation of MS/MS data.
0
10
20
30
40
50
60
0 20 40 60 80 100
hyperscore
# re
sults
“incorrect”IDs
Histogram of Correlation Scores
Highest scoring “match” is assumed to be “correct”
0
0.5
1
1.5
2
2.5
3
3.5
4
20 25 30 35 40 45 50
0
10
20
30
40
50
60
0 20 40 60 80 100
hyperscore
# re
sults
log(
# re
sults
)
significant
Significant scores
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100
0
10
20
30
40
50
60
0 20 40 60 80 100
hyperscore
# re
sults
log(
# re
sults
)
E-value=e-8.2
Estimating E-values
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100
hyperscorelo
g(#
resu
lts)
E-value=10-8.2
Interpreting E-values
E-value is the number of matches you “expect” to find at random, given thesearch parameters. Or the chance of getting a match this good from this spectrum by random chance. So this would be a chance of 10-8.2 or 1 in ~150,000,000.
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100
hyperscorelo
g(#
resu
lts)
E-value=10-8.2
Interpreting E-values
v
E-value=10-3.9
Changing the search parameters will change the statistics. Allowing many post-translational modifications and/or amino acidsubstitutions, using a very large database, allowing large mass errors,etc.
Can use a “decoy” or false database to verify the statisticalmodels being used.
Use a Decoy Database to Determine False Discovery Rate
A good decoy database:-does not contain any “correct” hits-is the same size as the Query database-has the same distribution of amino acids-has the same size distribution of proteolytic fragments (peptides)-can be reproduced by other labs
Reversed databases solve most of these constraints-so the “protein” RSAMPLER digested with trypsin gives:
-SAMPLER (forward)-RELPMASR ELPMASR (reverse)
-using the reverse typically only gives problems with palindromic sequences….which thankfully are rare*.
*except in viruses!
Use a Decoy Database to Determine False Discovery Rate
By definition: Everything from the Reversed database is incorrect!
Use a Decoy Database to Determine Global False Discovery Rates
< 0.5%
< 1%
< 1.5%
< 2.0%
< 2.3%
< 2.6%
0
20
40
60
80
100
120
140
160
180
200
-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3
“Forward”
“Reverse”
Num
ber o
f spe
ctra
in e
ach
bin
Calculating a “local” False Discovery Rate
0 -1 -1.5 -3.0 -4.0
Predicting phosphorylation sites
(low confidence)
(high confidence)
Incorrect n = 51
S118 n = 6S153 n =
54
Using FDR to choose the “best” Database
Amino Acid Substitutions can be modeled using Statistics
Allowing for Amino Acid Substitutions
117 vs 178 proteins
So now what do I do?
Interpreting Protein Lists
Comparison based on Gene Ontology (GO)
Control Experiment
Comparison based on Gene Ontology (GO)
Control Experiment
Interpreting Protein Lists
Functional Clustering of Protein Lists
Domain Enrichment
Enrichment of Components from known Pathways
Build Networks
http://string-db.org/
Other Resources
DAVID: http://david.abcc.ncifcrf.gov/GO based enrichment. Clustering of redundant GO terms.Mapping onto KEGG pathways. Mapping onto disease pathways, etc.
Biogrid: thebiogrid.orgAn Online Interaction Respository With Data Compiled Through Comprehensive Curation Efforts.
MIPS: http://mips.helmholtz-muenchen.de/proj/ppi/Manually curated protein protein interaction database
Summary
Proteomics:-large scale analysis of proteins (really peptides)-statistical analysis is required for interpretation
-can be used to address a wide range of biological problems
-best used to answer discrete questions-things that can not be answered by genomic
techniques-protein complexes-protein modification-other post translational events
-change is subcellular localization-question will help determine which hits are chosen for
validation
Feel free to email me questions: [email protected]
Ubiquitin:• Short protein that is
covalently attached to other proteins
• 7 lysine residues• all can form poly-Ub
chains• K48 chains involved in
proteosmal degradation• K63 chains involved in
signaling• K11 chains ????• K6 chains (DNA damage)• K29 chains ????
K6-Ub Pulldown
HA
?
K6-Ub
K6-Ub
K6-Ub
K6-UbHA
HA
HA??
?
• Tagged ubiquitin with only one available lysine.
• Pull down K6-linked poly-Ub chain.
• Identify proteins.
K6 chains are assembled by BRCA1
K6-Ubiquitin Pulldown
K6 Ub-IP
On Bead Digest
Data Analysis
Top Candidate
WHIP1
LC-MS/MS
K6 Ub-IP
AAA+ATPaseRFC
ZFRad18
Hit Criteria• Not in the control sample• Not a commonly known contaminant• Good score (more than one peptide) -Expressed as an False Discovery
Rate• Seen in repeat experiments
4959
40 79
Potential Hits Ub and Ub-binding proteins
Excluded:• heat shock• hnRNP• ribosomal• keratin• histones
Proteins also found in the control IP
Results from a representative non-denaturing K6-ubiquitin IP
Ubiquitin-binding proteins
“potential hits”
good scores high confidence
poor scores low confidence
2022 233
Overlap Between K6 and K63 Pulldowns
UB-K6 UB-K63
Werner’s helicase interacting protein 1 (WHIP1)
good scores high confidence
poor scores low confidence
Rad18-like Zn+ finger
AAA+ ATPase
WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
Why does WHIP co-IP with ubiquitin?
• WHIP is ubiquitinated
• WHIP is a ubiquitin-binding protein
WHIPUb
Ub
Ub
Ub
WHIP
Ub
Ub
Ub
Covalent bond
Non-covalent interaction
Ub6
Ub4
Ub5
Ub3
Ub2
Ub1
Ub6
Ub5
Ub4
Ub3
Ub2
Ub1
I.P from bacterial lysate with anti-FLAG beads (Sigma)W.B. anti-ubiquitin (6C1) 1:1000, secondary = anti-mouse TrueBlot, exposure = 10 sec.
mono Ub K48 K63 mono
Ubmono
UbK48 K63 K48 K63
input FLAG-BAP FLAG-WHIP
Co-IP of WHIP with various poly-Ub chains in vivo
IP = α-HA (ubiquitin) WB = α-FLAG (whip)IPs from doubly-transfected 293 cells
FLAG-Whip + - + + + + + + HA-Ub - + K6 K11 K29 K48 K63 -
250 kD
100
75
50
37
150
α-FLAG IP
WHIP is Ubiquitinated
250150100
75
WHIP-FLAG-MAT - +
Ni-NTA pulldown in 8 M urea from 293T cells, W.B. = anti-FLAG (M2) 1:5000
WHIP Ubiquitinylation• Mass spectrometry
PEPTIDE (aa) SEQUENCE MODIFICATIO
N E-VALUE
254-274 SLLETNEIPSLILWGPPGCGK 274K(114.1) 6.8e-007
292-310 FVTLSATNAKTNDVRDVIK 301K(114.1) 3.7e-004
292-306 FVTLSATNAKTNDVR 301K(114.1) 1.4e-004
302-316 TNDVRDVIKQAQNEK 310K(114.1) 3.7e-005
311-321 QAQNEKSFFKR 316K(114.1) 2.6e-003
322-332 KTILFIDEIHR 322K(114.1) 7.0e-007
333-346 FNKSQQVNAALLSR 335K(114.1) 5.5e-012
449-462 VLITENDVKEGLQR 457K(114.1) 2.7e-010
Ubyquitinylated residue
SUMOylated residue
Why does WHIP co-IP with ubiquitin?
• WHIP is ubiquitinated
• WHIP is a ubiquitin-binding protein
WHIPUb
Ub
Ub
Ub
WHIP
Ub
Ub
Ub
Covalent bond
Non-covalent interaction
Rad18-like Zn+ finger
AAA+ ATPase
WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
WHIP’s Zinc finger domain is necessary for ubiquitin binding
mono-Ub
in vivo
- WT D37A T294A
lysate, blot = anti-FLAG
lysate, blot = anti-actin
IP = anti-FLAG
blot = anti-Ub
200
10075
50
33
25
15
in vitro
10
15
20
25
37
50Ub7
Ub6
Ub5
Ub4
Ub2
Ub3
WHIP UBZ
RAD18 UBA
BeadsInput
Rad18_ZF = UBZ ubiquitin binding domain
UBZ Domain-Containing Proteins
Summary of UBZ Domain BindingDomain monoUb Ub K48 Ub K63 SUMOWHIP - + + -Rad18 - + + -PolK - + + -Pol H ? ? ? ?UBZ1 - + + -MTMR15 - - - -
Why does WHIP co-IP with ubiquitin?
• WHIP is ubiquitinated
• WHIP is a ubiquitin-binding protein
WHIPUb
Ub
Ub
Ub
WHIP
Ub
Ub
Ub
Covalent bond
Non-covalent interaction
Rad18-like Zn+ finger
AAA+ ATPase
WHIP domain architecture
Does not contain any recognizable Ubiquitin binding domain
WHIP’s Zinc finger domain is necessary for ubiquitin binding
mono-Ub
in vivo
- WT D37A T294A
lysate, blot = anti-FLAG
lysate, blot = anti-actin
IP = anti-FLAG
blot = anti-Ub
200
10075
50
33
25
15
in vitro
10
15
20
25
37
50Ub7
Ub6
Ub5
Ub4
Ub2
Ub3
WHIP UBZ
RAD18 UBA
BeadsInput
Rad18_ZF = UBZ ubiquitin binding domain
UBZ Domain-Containing Proteins
1511 135
Overlap Between Pulldowns using Different UBZ Domains
UBZ-WHIP UBZ- UBZ1
WHIP-EGFP WHIPD37A-EGFPWHIP-EGFP
EGFP UBZ1-EGFP UBZ1D473A-EGFP
UBZ Domain Regulates SubcellularLocalization
HA-UBZ1 - WT D473A
UBZ Domain Regulates CoupledUbiquitination
- 1 12
WHIP
UBZ-1 and WHIP are differentially Regulated by UV damage
WHIP Ubiquitinylation• Mass spectrometry
PEPTIDE (aa) SEQUENCE MODIFICATIO
N E-VALUE
254-274 SLLETNEIPSLILWGPPGCGK 274K(114.1) 6.8e-007
292-310 FVTLSATNAKTNDVRDVIK 301K(114.1) 3.7e-004
292-306 FVTLSATNAKTNDVR 301K(114.1) 1.4e-004
302-316 TNDVRDVIKQAQNEK 310K(114.1) 3.7e-005
311-321 QAQNEKSFFKR 316K(114.1) 2.6e-003
322-332 KTILFIDEIHR 322K(114.1) 7.0e-007
333-346 FNKSQQVNAALLSR 335K(114.1) 5.5e-012
449-462 VLITENDVKEGLQR 457K(114.1) 2.7e-010
Ubyquitinylated residue
SUMOylated residue
UBZ-1 but not WHIP Interacts with PCNA
Interaction between UBZ1 and PCNA is increasedFollowing UV treatment
31
Overlap Between UBZ1 and WHIP
UBZ1 WHIP
1645
WHIP1
ERCC1ERCC4DDB1 DDB2
Large T dnaJ
RuvBL1
RuvBL2
UBZ1
BLAP75WRN
BLM
Ku70 Ku80
DNA-PK
PCNA
Topo3A RPA1
Ub
Overlap Between UBZ1 and WHIP
Summary- Proteins are part of a highly integrated network
-The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions
-Functional UBZ domains are found only in proteins involved in DNA replication and/or repair
-UBZ domains are frequently found in concert with PIP boxes
-The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes
-New “UBZ domains” are being found everyday
Protein Networks
Giulia DeSabbataAntonell Piccini
Michael P Myers
Fabio Rossi
Martina Colombin
Acknowledgements:Fabio RossiMartina Colombin
Rebecca BishAntonella PicciniGiulia DeSabbata
Sandor Pongor
Providing Reagents:Bruce StillmanMasashi Narita/Scott LoweTomohiko OhtaToshiki Tsurimoto
HA-UBZ1 - WT D473A
UBZ Domain Regulates CoupledUbiquitination
1511 135
Overlap Between Pulldowns using Different UBZ Domains
UBZ-WHIP UBZ- UBZ1
WHIP-EGFP WHIPD37A-EGFPWHIP-EGFP
EGFP UBZ1-EGFP UBZ1D473A-EGFP
UBZ Domain Regulates SubcellularLocalization
- 1 12
WHIP
UBZ-1 and WHIP are differentially Regulated by UV damage
UBZ-1 but not WHIP Interacts with PCNA
Interaction between UBZ1 and PCNA is increasedFollowing UV treatment
antibody bead
trypsin digest directly from beads
antibody bead
IP
Affinity Purification - coIP
31
Overlap Between UBZ1 and WHIP
UBZ1 WHIP
1645
WHIP1
ERCC1ERCC4DDB1 DDB2
Large T dnaJ
RuvBL1
RuvBL2
UBZ1
BLAP75WRN
BLM
Ku70 Ku80
DNA-PK
PCNA
Topo3A RPA1
Ub
Overlap Between UBZ1 and WHIP
Summary- Proteins are part of a highly integrated network
-The UBZ domain mediates: Ubiquitin Binding, Coupled ubiquitination, subcellular localization, protein::protein interactions
-Functional UBZ domains are found only in proteins involved in DNA replication and/or repair
-UBZ domains are frequently found in concert with PIP boxes
-The UBZ domain acts in concert with other domains to regulate the formation of Ubiquitin-dependent complexes
-New “UBZ domains” are being found everyday
Protein Networks
Giulia DeSabbataAntonell Piccini
Michael P Myers
Fabio Rossi
Martina Colombin
Acknowledgements:Fabio RossiMartina Colombin
Rebecca BishAntonella PicciniGiulia DeSabbata
Sandor Pongor
Providing Reagents:Bruce StillmanMasashi Narita/Scott LoweTomohiko OhtaToshiki Tsurimoto
Major Types of ProteomicsSurvey Proteomics:
Qualitative or Quantitative Analysis of the protein component
-whole organism, tissue, cell type, or subcellular compartment
-2D gel electrophoresis ->MS-typically a few 100 proteins
-Multidimensional LC->MS/MS-typically 1000-5000 proteins
Identification of Biomarkers
Interactomics:Mapping Protein:Protein Interactions
-Yeast 2-hybrid techniques-high throughput protein identification by Mass Spectrometry
Mapping Post-Translational Modifications-High Content Mass Spectrometry
Key Concept: Proteomics is the large scale identification of proteins or peptides