the subcellular proteome of undifferentiated human embryonic stem cells
TRANSCRIPT
RESEARCH ARTICLE
The subcellular proteome of undifferentiated human
embryonic stem cells
Prasenjit Sarkar1, Timothy S. Collier 2, Shan M. Randall2, David C. Muddiman2
and Balaji M. Rao1
1 Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA2 W.M. Keck FT-ICR Mass Spectrometry Laboratory, Department of Chemistry, North Carolina State University,
Raleigh, NC, USA
Received: September 23, 2011
Revised: October 31, 2011
Accepted: November 14, 2011
We have characterized the subcellular proteome of human embryonic stem cells (hESCs)
through MS analysis of the membrane, cytosolic, and nuclear fractions, isolated from the
same sample of undifferentiated hESCs. Strikingly, 74% of all proteins identified were
detected in a single subcellular fraction; we also carried out immunofluorescence studies to
validate the subcellular localization suggested by proteomic analysis, for a subset of proteins.
Our approach resulted in deeper proteome coverage – peptides mapping to 893, 2475, and
1185 proteins were identified in the nuclear, cytosolic, and membrane fractions, respectively.
Additionally, we used spectral counting to estimate the relative abundance of all cytosolic
proteins. A large number of proteins relevant to hESC biology, including growth factor
receptors, cell junction proteins, transcription factors, chromatin remodeling proteins, and
histone modifying enzymes were identified. Our analysis shows that components of a large
number of interacting signaling pathways are expressed in hESCs. Finally, we show that
proteomic analysis of the endoplasmic reticulum (ER) and Golgi compartments is a powerful
alternative approach to identify secreted proteins since these are synthesized in the ER and
transit through the Golgi. Taken together, our results show that systematic subcellular
proteomic analysis is a valuable tool for studying hESC biology.
Keywords:
Cell biology / Extracellular proteins / Human embryonic stem cells / Subcellular
fractionation / Subcellular proteomics
1 Introduction
Human embryonic stem cells (hESCs) are pluripotent cells
originally derived from the inner cell mass of the preim-
plantation blastocyst stage embryo [1]. hESCs can be
propagated indefinitely in cell culture and can also differ-
entiate into all somatic cell types. Therefore, hESCs have
great potential to revolutionize regenerative medicine,
provide a renewable source for the generation of functional
cells for drug evaluation, and serve as model systems
for studying human embryogenesis. While the precise
molecular mechanisms controlling pluripotency of
hESCs remain unclear, it is increasingly becoming
evident that hESCs are maintained in the undifferentiated
state by a large network of interacting pathways. The
contribution of a specific pathway toward the maintenance
of hESC pluripotency depends on several factors
such as the subset of component proteins in that pathway
that are actually expressed in hESCs, their subcellular
localization, relative levels of protein expression, and
the stoichiometric ratios of various interacting proteins.
Colour Online: See the article online to view Figs. 1–4 and Table 1 in
colour.
Abbreviations: CM, conditioned medium; EC, embryonal carci-
noma; EP300, Histone acetyltransferase p300; GO, Gene Ontol-
ogy; hESC, human embryonic stem cell; MO4L1, Mortality factor
4-like protein 1; NSAF, Normalized Spectral Abundance Factor;
SILAC-CM, conditioned medium containing stable isotopes 13C6,15N2 L-lysine and13C6 L-arginine
Correspondence: Professor Balaji M. Rao, Department of
Chemical and Biomolecular Engineering, North Carolina State
University, Campus Box 7905, EB1, Raleigh 27695, NC, USA
E-mail: [email protected]
Fax: 11-919-513-3465
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2012, 12, 421–430 421DOI 10.1002/pmic.201100507
Large-scale analysis of protein expression using MS can
provide a systems-level perspective on pathways that are
potentially active in hESCs. Indeed, over the last few years,
several proteomic studies on hESCs have been reported
(reviewed in [2, 3]).
Subcellular fractionation prior to proteomic analysis is a
powerful approach to reduce the overall sample complexity
and obtain deeper sequence coverage, as well as to study
protein localization [4]. In the context of hESCs, subcellular
fractionation has been largely used as a tool for the reduc-
tion of sample complexity. In particular, several studies have
focused on comparative analysis of membrane proteins in
hESCs and differentiated cells, or multiple different hESC
or embryonal carcinoma (EC) lines. For instance, Harkness
et al. identified membrane proteins that are expressed in
hESCs under two different culture conditions [5]. Dormeyer
et al. studied the differences in the membrane proteomes of
hESCs and human EC cells [6]. More recently, Gerwe et al.
characterized the differences in membrane protein expres-
sion between different hESC lines, including an hESC line
with karyotypic abnormalities, and their differentiated
derivatives [7]. Prokhorova et al. used a SILAC-based
approach to assess quantitative differences in
membrane protein expression between hESCs and cells
undergoing differentiation [8]. Van Hoof et al. also used
SILAC to identify cell-surface proteins that were
differentially expressed in cardiomyoctes derived from
hESCs [9]. In comparison, relatively fewer studies have
focused on the analysis of the nuclear hESC proteome.
Barthelery et al. evaluated a procedure to isolate
nuclear proteins and subsequently deplete the sample of
histones to improve nuclear proteome coverage, in the
context of hESCs [10]. Pewsey et al. studied the dynamic
changes in the nuclear proteome of a human EC cell line
(NTERA-2) upon differentiation induced by treatment with
retinoic acid [11].
In this study, we present the proteomic analysis of
membrane, cytosolic, and nuclear fractions, obtained from a
single sample of undifferentiated hESCs; we also used the
method of spectral counting to quantify the relative abun-
dances of all proteins identified in the cytosolic fraction. To
the best of our knowledge, our study represents
the first reported instance of a comprehensive character-
ization of the hESC proteome at subcellular resolution,
coupled with quantitative estimates of relative protein
expression. The simultaneous isolation of multiple
compartments allows us to assess the effectiveness of
subcellular fractionation, and consequently interpret
proteomic data to gain insight into protein localization.
Further, we also validated the localization of a subset of
proteins identified through MS, using fluorescence spec-
troscopy. Interestingly, we identified several secreted
proteins in subcellular fractions containing endoplasmic
reticulum (ER) and Golgi; secreted proteins are synthesized
in the ER and transit through the Golgi. Our results suggest
that proteomic analysis of the ER and Golgi compartments
is a powerful alternate approach to interrogate the secretome
of hESCs.
2 Materials and methods
2.1 Cell culture
Undifferentiated H9 hESCs were cultured on tissue culture
plates coated with MatrigelTM (BD Biosciences, Bedford,
MA, USA), in mouse embryonic fibroblast (MEF) condi-
tioned medium (CM), or in CM without L-lysine and
L-arginine, but containing the stable isotopes 13C6, 15N2
L-lysine, and 13C6 L-arginine (Pierce, Rockford, IL, USA)
(SILAC-CM), as described previously [12]. Stable isotope-
labeled arginine and lysine incorporation of 98.5 and 98.0%,
respectively, was achieved. Arginine-to-proline conversion
was determined to be approximately 5% in our system [13].
2.2 Immunofluorescence
Cells were passaged on to glass-bottom culture dishes
(Greiner Bio-one, Monroe, NC, USA) coated with Matri-
gelTM (BD Biosciences) and grown in CM. Cells were fixed
with 4% paraformaldehyde (Fisher Scientific, Houston, TX,
USA) and permeabilized with 0.5% Triton X-100 (Acros
Organics, Geel, Belgium). Subsequently, cells were blocked
in 1� PBS with 5% BSA and 0.3% Triton X-100 and stained
overnight with the primary antibody in the same buffer.
Rabbit-anti-human antibodies for b-CATENIN (Cell Signal-
ing, Danvers, MA, USA), CALPAIN1 (CALPAIN m-type)
(Cell Signaling), E-CADHERIN (Cell Signaling), P300
(Thermo Scientific, Rockford, IL, USA), Mortality factor
4-like protein 1 (MO4L1) (Sigma-Aldrich, St. Louis, MO,
USA), ENY2 (Cell Signaling), and THYMOSIN b4 (Milli-
pore, Billerica, MA, USA) were used. Isotype rabbit IgG was
purchased from Cell Signaling. Cells were then stained with
Alexa 633-conjugated goat-anti-rabbit IgG (Invitrogen,
Carlsbad, CA, USA) and DAPI (Invitrogen) and imaged
using a Zeiss LSM 710 confocal microscope.
2.3 Subcellular fractionation
Membrane, cytoplasmic, and nuclear fractions were isolated
from hESCs as described previously [12]. Detailed protocols
for subcellular fractionation are provided in Supporting
Information File 1. The ER-enriched fraction was isolated as
follows. Cells were scraped and homogenized in Sucrose
Buffer 1 (SB1; 250 mM sucrose, 25 mM potassium chloride,
5 mM magnesium chloride, 10 mM triethanolamine, 10 mM
acetic acid, 2.5 mM sodium pyrophosphate, 1 mM b-glycer-
ophosphate disodium salt, 1 mM sodium orthovanadate,
cØmplete minis Protease Inhibitor cocktail tablets (Roche,
Indianapolis, IN, USA), phosphatase inhibitor cocktails I
422 P. Sarkar et al. Proteomics 2012, 12, 421–430
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
and II (Sigma-Aldrich), with pH adjusted to 7.6 using either
triethanolamine and/or acetic acid), and centrifuged at
1000� g for 10 min. The supernatant was collected and
centrifuged at 3000� g for 10 min. Again, the supernatant
was collected and centrifuged at 15 000� g for 30 min and
this pellet was retrieved as the ER–Golgi-enriched fraction.
The pellet was homogenized in 8 M urea and 50 mM
ammonium bicarbonate, in the presence of protease and
phosphatase inhibitors and used for MS analysis.
2.4 MS and data analysis
This study was conducted in parallel with a technical
analysis comparing relative quantification of protein
expression using SILAC and spectral counting [12]. The
overall experimental design is shown in Supporting Infor-
mation Fig. 1. Cells grown on CM were differentiated by
adding 25mM SB431542 (Sigma-Aldrich); the medium was
replaced every day with fresh CM containing 25 mM
SB431542. Subsequently, protein samples obtained through
subcellular fractionation from undifferentiated cells grown
in SILAC-CM were combined with those from differentiat-
ing cells at different time points, and analyzed using MS.
Data from heavy peptide identifications were used for the
analysis of undifferentiated hESCs, as presented in this
article. Simultaneously, hESCs cultured in SILAC-CM were
also used for the analysis of the cytoplasmic fraction by
spectral counting. Cells cultured in CM were used for the
analysis of subcellular fractions enriched in ER and Golgi.
A total of 25 mg of protein from unlabeled cells differ-
entiated in CM was mixed with 25 mg of protein from
undifferentiated hESCs cultured in SILAC-CM for the
analysis of SILAC samples. In all, 50mg of protein sample
each was used for the quantification of cytoplasmic proteins
by spectral counting and analysis of the ER-enriched frac-
tion. MS and subsequent data analysis was carried out as
described previously [12]. Briefly, samples were run on a
10–20% Tris-HCl Gel (Bio-Rad, Hercules, CA, USA),
reduced with DTT, alkylated with iodoacetamide, and
digested with proteomic-grade trypsin. In-gel-digested
peptide samples were separated in an Eksigent 1-D1nano-
LC system (Eksigent, Dublin, CA, USA) with a vented
column configuration and detected using an LTQ-Orbitrap
XL. Magic C18AQ (particle size, 5mm; pore size, 200 A;
Microm BioResources, Auburn, CA, USA) was used as
packing material for both the trapping and the analytical
columns. An IntegraFrit capillary (New Objective, Woburn,
MA, USA) measuring 5 cm in length and having a 75 mm id
was employed as a trap for desalting peptides. A PicoFrit
capillary (New Objective) with a 75 mm id and measuring
15 cm in length was utilized as the analytical column.
Burdick and Jackson (Muskegon, MI, USA) supplied all LC
solvents. The composition of mobile phase A was 98%
water, 2% acetonitrile, and 0.2% formic acid. The compo-
sition of mobile phase B was 98% acetonitrile, 2% water,
and 0.2% formic acid. In total, 8 mL of sample was injected at
a flow rate of 1.5 mL/min and switched to 350 nL/min via a
10-port valve before eluting peptides onto the analytical
column. The gradient increased from 2% B to 50% B over
127 min before ramping up to 95% B. After holding for
5 min at 95% B, re-equilibration was established by flowing
at 2% B for 10 min.
Precursor scans in the Orbitrap analyzer were acquired
with 60 000 resolving power at m/z 400, and these broad-
band scans were followed by up to eight data-dependent
MS/MS scan events in the ion trap. The minimum MS
signal threshold for MS/MS activation was set to 2500.
Collision-induced dissociation was employed for fragmen-
tation with an isolation width of m/z 2 and normalized
collision energy of 35%. Unassigned and 11 charge states
were rejected for MS/MS, and dynamic exclusion was set to
180 s with one repeat count and a repeat duration of 0 s.
Automatic gain control settings were 8� 103 ions for the ion
trap and 1� 106 ions for the Orbitrap. Ionization times were
restricted to 80 and 500 ms for the ion trap and Orbitrap,
respectively.
RAW LC-MS/MS files were processed using MASCOT
Distiller (Matrix Science, Boston, MA, USA) to generate
peak lists in.mgf format for database searching using the
MASCOT server; all searches were performed against the
UniProt human database containing both target and reverse
protein sequences (last modified: 10/2010). Search toler-
ances were set to 75 ppm for the precursor ion and to
70.6 Da for the fragment ions. Cysteine carbamidomethy-
lation was set as a fixed modification, and variable modifi-
cations included oxidation of methionine as well as
deamidation of glutamine and asparagine. ProteoIQ
(BioInquire, Athens, GA, USA) was used to create protein
lists at 1% false discovery rate (FDR). Normalized Spectral
Abundance Factor (NSAF) values were manually calculated
using unnormalized spectral count values obtained from
ProteoIQ as follows:
ðNSAFÞx ¼ðSpC=LÞx
PNi¼1 ðSpC=LÞi
where L is the number of amino acids
SpC is the total number of MS/MS spectra that identify
protein x.Subsequent Gene Ontology (GO) annotation analysis was
carried out using Blast2Go [14] and DAVID [15].
3 Results and discussion
3.1 Elucidation of the subcellular proteome of
undifferentiated hESCs
We carried out proteomic analysis of the nuclear, cytosolic,
and membrane components derived from subcellular frac-
tionation of a single sample of undifferentiated hESCs. Flow
Proteomics 2012, 12, 421–430 423
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
cytometric analysis of the expression of the pluripotency
markers OCT4 and SSEA4 in the cells used for proteomic
analysis is shown in Supporting Information Fig. 2. Our
SILAC analysis identified peptides corresponding to 893
proteins that map to 713 protein groups in the nuclear
sample, 1397 proteins mapping to 1155 protein groups in
the cytosolic fraction and 1185 proteins that map to 929
protein groups in the membrane sample. Performing
spectral counting analysis on the undifferentiated cyto-
plasmic sample, we obtained 2439 protein identifications
from 2070 protein groups. Taking the SILAC and spectral
counting analysis together, we identified 2475 proteins in
the cytosolic fraction. Detailed lists of proteins identified in
the three subcellular fractions are provided in Supporting
Information Tables 1–9. The spectral count analysis on the
cytosolic fraction also gave us the relative abundance values
of cytosolic proteins. In this method, the total number of
peptide identifications in the data that map to each protein
is normalized by the protein length to give the relative
protein abundance in the sample; this metric is called the
NSAF. NSAF values can be compared across different
proteins in the same biological sample, giving the relative
abundances of these proteins in the same sample [16–18]. A
comprehensive list of the NSAF values for all cytosolic
proteins is provided in Supporting Information Table 9.
An overview of proteins identified in our proteomic
analysis is shown in Fig. 1. Of the 1185 proteins identified
in the plasma membrane fraction, 540 were found in this
fraction only, i.e. they were not identified in the nuclear or
cytosolic fractions. Similarly, out of the 2475 proteins
identified in the cytosolic fraction, 1675 were found only in
the cytosolic fraction and out of the 893 proteins identified
in the nuclear fraction, 376 were unique to this fraction
only. Thus, in total out of 3359 proteins identified, 2491
(74%) were unique to a single subcellular fraction and only
26% were shared between fractions, indicating that the
fractionation procedure had merit; only 9.7% of the proteins
were identified in all three fractions. Nevertheless, the
fractionation was not perfect; we identified proteins from
intracellular organelles such as mitochondria, ER, and Golgi
in all three fractions. Also, the list of proteins found in all
three fractions includes highly abundant proteins such as
actins and histones that can contaminate other fractions.
However, despite these limitations, the fact that 74% of the
identified proteins were unique to one of the three subcel-
lular fractions demonstrates distinct subcellular localization
of proteins in hESCs and underlines the importance of
subcellular fractionation procedures while studying the
hESC proteome.
3.2 Identification of proteins relevant to hESC
biology
An important advantage of subcellular fractionation in
proteomic analysis is the significantly deeper coverage
obtained due to the reduction in sample complexity. Several
growth factor receptors, G-proteins, integrins, as well as
cell–cell junction proteins such as those of the
adherens junctions, tight junctions, gap junctions, and
desmosomes were identified from the membrane fraction
(Supporting Information Table 10). Similarly, several chro-
matin-remodeling enzymes, histone acetyltransferases,
histone deacetylases, histone methyltransferases, DNA
methyltransferases as well as transcription factors were
identified in the nuclear fraction, despite their presence in
lower abundances than structural proteins or histones
(Supporting Information Table 11). To our knowledge, this
is the first attempt at a comprehensive characterization of
the epigenetic factors present in the nucleus of undiffer-
entiated hESCs. A few epigenetic factors and transcription
factors were identified in the cytoplasmic fraction; these are
summarized in Supporting Information Table 12. We also
identified several serine/threonine/tyrosine kinases and
phosphatases, as well as cell-cycle regulators in all three
subcellular fractions. A list of these proteins along with their
experimentally observed subcellular localization is provided
in Supporting Information Table 13. Taken together, our
Figure 1. Overview of protein identifications obtained in membrane, cytosolic, and nuclear fractions of undifferentiated hESCs.
(A) Overview of protein identifications obtained across all three fractions combined. The number of proteins identified in a single
subcellular fraction (membrane, cytosol, or nucleus only), two subcellular fractions or all three fractions is indicated. (B) Proteins iden-
tified and (C) annotated in a single subcellular fraction versus those identified/annotated in multiple subcellular fractions are indicated.
424 P. Sarkar et al. Proteomics 2012, 12, 421–430
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
results highlight the effectiveness of subcellular proteomics
in elucidating the protein expression profile of hESCs.
3.3 Confirmation of localization using fluorescence
microscopy
As discussed earlier, 74% of proteins were identified in a
single subcellular fraction; this underlines the effectiveness
of our subcellular fractionation protocol. We further vali-
dated subcellular localization suggested by proteomic
analysis for seven target proteins using fluorescence
microscopy–Enhancer of yellow 2 transcription factor
homolog (ENY2), Calpain-1 catalytic subunit (CAN1),
Histone acetyltransferase p300 (EP300), b-catenin (CTNB1),
Thymosin b-4 (TBY4), MO4L1, and Cadherin-1 (E-cadherin
or CADH1). Table 1 summarizes the comparison between
immunofluorescence data and results obtained using
subcellular proteomic analysis and the corresponding
microscopy images are shown in Fig. 2.
Our results show that five out of seven proteins identified
in the cytoplasm through subcellular proteomic analysis
were also detected in the cytoplasm using microscopy. Note
that despite being annotated as nuclear-only, ENY2 and
MO4L1-components of the SAGA and NuA4 histone acet-
yltransferase (HAT) complexes, respectively, are present in
the cytoplasm of undifferentiated hESCs as suggested by
proteomic analysis. Interestingly, cytoplasmic localization of
ENY2 has been reported in Drosophila S2 cells [19]. EP300
and b-catenin were identified in the cytoplasm using spec-
tral counting but not detected using immunofluorescence.
Pertinently, cytoplasmic localization of EP300 has been
reported previously [20]. Also, cytoplasmic b-catenin is a key
component of the Wnt signaling pathway; activation of Wnt
signaling has been reported to maintain hESC pluripotency
[21]. EP300, b-catenin, MO4L1, and E-cadherin were not
detected in the analysis of the cytoplasmic fraction from the
SILAC-CM sample but were identified using spectral
counting. It is not surprising that spectral counting is
consistently able to identify more proteins due to analyzing
replicate injections of a relatively less complex sample [12].
Also, five out of seven proteins were identified in the
nucleus using immunofluorescence but were not detected
in the proteomic analysis of the nuclear fraction. This can be
attributed to the fact that MS approaches utilized in this
study were global platforms optimized to identify and
quantify as much of the proteome as possible. A large
number of proteins may go undetected using global
Table 1. Comparison between protein identifications in hESCs cultured in SILAC-CM (ID), spectral counting (SpC), andimmunofluorescence (IF)
GO annotations (GO) of proteins are also listed for reference. ND, not determined; measurement was not conducted; 1, protein detected;0, protein not detected.
Proteomics 2012, 12, 421–430 425
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
approaches due to the limitations of the instrument related
to duty cycle as well as detection limits. This is primarily due
to the complexity of the sample that contains a large number
of proteins with a wide range of expression levels. On the
other hand, the immunofluorescence assay is a targeted
approach which seeks to detect a specific protein of interest.
Further, the use of a secondary antibody for the detection
allows for a potentially large increase in signal intensity;
multiple fluorescent labels can bind to a single primary
antibody. Indeed, this is a major advantage of using
immunofluorescent detection schemes for low-abundance
proteins. Thus for these proteins, immunofluorescent
detection is more sensitive than MS analysis.
The proteins selected for immunofluorescence analysis
have a range of cytoplasmic NSAF values (8.67� 10�6–
7.53� 10�3), suggesting a range of abundances in the cyto-
plasm. Strictly speaking, fluorescence intensities between
different images cannot be compared due to the use of
different antibodies and unknown binding affinities of anti-
body–target interaction. Nevertheless, assuming antibody
excess during immunofluorescent staining leading to
complete saturation of target sites, there is broad qualitative
agreement between the NSAF values and the corresponding
cytoplasmic fluorescent intensity; proteins with higher
abundance, as suggested by their NSAF value, show greater
fluorescent labeling in these cases.
3.4 Identification of key signaling pathways in
hESCs
We combined the lists of proteins obtained through MS
analysis of all three subcellular fractions and identified
signaling pathways that these proteins have been associated
with, using the Protein Interaction Database (PID) at the
National Cancer Institute (NCI). Our analysis shows that
undifferentiated hESCs express proteins that map to
numerous signaling pathways, including the Activin/Nodal
pathway, BMP pathway, canonical, and noncanonical Wnt
pathways, FGF pathway, IGF pathway, Akt pathway, and the
HDAC Classes I, II, and III pathways. A comprehensive list
of pathways identified and their experimentally detected
component proteins is provided in Supporting Information
Table 14. Our analysis shows that the components of a large
number of signaling pathways are expressed in undiffer-
entiated hESCs. These pathways exhibit considerable
crosstalk, as evidenced by several proteins mapping to
multiple pathways. In concert, these pathways likely dictate
all aspects of hESC biology such as pluripotency, self-
renewal, suppression of epithelial-to-mesenchymal trans-
formation (EMT), regulation of cell cycle, and suppression
of apoptosis.
The expression level of pathway components is an
important determinant of the overall influence of a specific
pathway on hESC fate. Since several components of multi-
ple signaling pathways are present in the cytoplasm, we
used spectral counting analysis to assess the relative abun-
dance of cytoplasmic proteins; higher NSAF values corre-
spond to higher protein abundance. A schematic diagram
Figure 2. Immunofluorescence analysis of intracellular localiza-
tion of target proteins in undifferentiated hESCs. Cells were
stained with a nuclear dye (DAPI) and an antibody specific to the
target protein. (A, D, G, J, M, P, and S) DAPI signal; signal due to
immunofluorescent staining with antibody: (B) CALPAIN1
(m type) antibody. (E) MO4L1 antibody. (H) THYMOSIN b4 anti-
body. (K) b-CATENIN antibody. (N) E-CADHERIN antibody.
(Q) ENY2 antibody. (T) P300 antibody; (C, F, I, L, O, R, and U)
Composite images showing fluorescence due to nuclear dye as
well as antibody staining.
426 P. Sarkar et al. Proteomics 2012, 12, 421–430
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Figure 3. Network
diagram for the
subset of signaling
pathways identified
in hESCs. Relative
expression and
localization data
have been depicted
simultaneously.
Proteins identified
only in membrane
are shown in
yellow. Proteins
identified only in
cytosol are shown
in shades of red,
corresponding to
relative expression,
as shown in the
legend. Proteins
identified in the
nucleus only are
shown in blue.
Proteins identified
in multiple frac-
tions are shown
with thick borders.
Proteins that were
not identified in our
analysis are shown
in grey. (Note: A
high-resolution
version of this
image is available
online.)
Proteomics 2012, 12, 421–430 427
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
that simultaneously combines known interactions between
signaling pathways, relative abundance of certain compo-
nent proteins as well as their experimentally determined
subcellular localization is shown in Fig. 3. Our figure is
restricted to a subset of signaling pathways, viz. the Activin/
Nodal pathway, BMP pathway, FGF pathway, IGF pathway,
Akt pathway, and the mTOR pathway, and the HDAC
Classes I, II, and III pathways. To our knowledge, this is the
first effort to characterize the signaling pathways active in
hESC, wherein data on both the relative protein expression
and the subcellular localization are combined. Interestingly,
several proteins that may regulate the amount of active
SMAD2 and SMAD3, such as PPM1A, NEDD4-L, CDK2/4,
CALM, and ERK1/2 [22–27], are present in the cytoplasm. A
list of these SMAD2/3-regulating proteins with their NSAF
values in the cytosol is provided in Supporting Information
Table 15.
3.5 ER as the door to the hESC secretome
The hESC microenvironment plays a significant role in the
regulation of hESC signaling network [28]. The complex
microenvironment of hESCs is determined in part by
endogenous factors that are secreted by hESCs. However,
experimental characterization of the hESC secretome is
challenging [29]. In our proteomic analysis, we identified a
small fraction of proteins annotated as being present in the
ER and Golgi in all the three subcellular fractions (Fig. 4A),
suggesting contamination by ER and Golgi. Indeed, the ER
membrane is contiguous with the outer nuclear envelope
and this causes the ER to be pulled down with the nucleus
during fractionation [30]. Since secreted proteins are
synthesized in the ER and transit through the Golgi, we
hypothesized that secreted proteins could be identified in
subcellular fractions. To test this hypothesis, we pooled the
list of proteins identified by LC-MS/MS and analyzed
the subset of proteins that are annotated as being present in
the extracellular region (GO: 0005576) and hence are puta-
tively secreted. As anticipated, this subset of proteins
contained several secreted factors including cytokines and
growth factors and extracellular matrix proteins. To further
confirm that these secreted proteins are obtained, at least in
part, from the ER and Golgi, we isolated a subcellular
fraction that was enriched in these compartments. As
shown in Fig. 4B, this enriched fraction contains few
proteins that are not annotated as being present in the
ER, Golgi, or mitochondria; this fraction has significant
Figure 4. GO annotation analysis of proteins identified in
subcellular fractions. (A) GO annotation analysis of proteins
identified in the membrane, nuclear, and cytoplasmic fractions.
Proteins were classified based on their GO annotations as being
present in the ER and Golgi only, mitochondria but not ER and
Golgi, extracellular, and others. (B) GO annotation analysis of
the fraction enriched in ER and Golgi.
Table 2. List of growth factors and cytokines and extracellularmatrix proteins identified
UniProtaccession
ID Description
Growth factors
P09038 FGF2 Heparin-binding growth factor 2P60983 GMFB Glia maturation factor bO60234 GMFG Glia maturation factor gP51858 HDGF Hepatoma-derived growth factorO75610 LFTY1 Left–right determination factor 1O00292 LFTY2 Left–right determination factor 2P55145 MANF Mesencephalic astrocyte-derived
neurotrophic factorP14174 MIF Macrophage migration inhibitory
factorP21741 MK Midkine
Extracellular matrix proteins
Q8NCW5 AIBP Apolipoprotein A-I-binding proteinP07355 ANXA2 Annexin A2P02649 APOE Apolipoprotein EQ9BUR5 APOO Apolipoprotein OA6NMY6 AXA2L Putative annexin A2-like proteinP27797 CALR CalreticulinP10909 CLUS ClusterinP12109 CO6A1 Collagen a-1(VI) chainP39060 COIA1 Collagen a-1(XVIII) chainP78310 CXAR Coxsackievirus and adenovirus
receptorQ14118 DAG1 DystroglycanP23142 FBLN1 Fibulin-1P02751 FINC FibronectinP06396 GELS GelsolinO75487 GPC4 Glypican-4Q9Y625 GPC6 Glypican-6Q5UCC4 INM02 UPF0510 protein INM02O00515 LAD1 Ladinin-1P11047 LAMC1 Laminin subunit g-1P09382 LEG1 Galectin-1P55001 MFAP2 Microfibrillar-associated protein 2Q08431 MFGM LactadherinQ32P28 P3H1 Prolyl 3-hydroxylase 1P40967 PME17 Melanocyte protein Pmel 17Q13162 PRDX4 Peroxiredoxin-4Q92626 PXDN Peroxidasin homologP02786 TFR1 Transferrin receptor protein 1
428 P. Sarkar et al. Proteomics 2012, 12, 421–430
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
mitochondrial contamination. Yet, GO analysis confirmed
that the enriched fraction contained several putative secreted
factors. A complete list of putatively secreted proteins, as
suggested by GO analysis, is provided in Supporting Infor-
mation Table 16; this list combines data from the nuclear,
membrane, and cytoplasmic fractions as well as the ER-
enriched fraction. A subset of these proteins – extracellular
matrix proteins and growth factors – is summarized in
Table 2. Interestingly, there was significant overlap between
the lists of secreted proteins identified using our approach
and a previous study on the hESC secretome [29]. In this
study, hESCs grown in feeder-free conditions were incu-
bated with serum-free medium for 24 h and proteomic
analysis of the hESC-CM was carried out. In total, 79 out of
160 proteins identified using our approach were also iden-
tified in the previously reported study. Differences in protein
identifications obtained using our approach may arise due to
two reasons. First, in our study, proteins are flagged as
putatively secreted if they are annotated as extracellular (GO:
0005576). The list of proteins thus obtained also contains
some proteins that are annotated, in addition to extra-
cellular, as also being present in other intracellular
compartments. One such example is a-tubulin-1 (Uniprot
ID: P68366). The second reason relates to an inherent
limitation of an approach involving proteomic analysis of
hESC-CM. While incubation of hESCs in serum-free
medium for 24 h does not affect the expression of plur-
ipotency markers such as Oct-4 and SSEA3 [29], it is
possible that the withdrawal of serum or serum supple-
ments results in changes in the secreted protein profile of
hESCs. In contrast, our approach does not require the use of
serum-free medium. Nevertheless, our results demonstrate
that the analysis of a subcellular fraction enriched in the ER
and Golgi is a viable strategy to interrogate the cellular
secretome. While the list of putatively secreted proteins
presented here is not comprehensive, we hypothesize
that optimization of our protocols for targeted enrichment
and purification of the ER and Golgi followed by MS
analysis can be used for complete characterization of the
secretome.
4 Concluding remarks
In this study, we have used subcellular fractionation prior to
proteomic analysis to comprehensively characterize the
membrane, cytoplasmic, and nuclear proteomes of undif-
ferentiated hESCs. Notably, 74% of the proteins identified in
our analysis were found in a single subcellular fraction,
simultaneously underscoring the effectiveness of our frac-
tionation procedure as well as the distinct subcellular loca-
lization of proteins in hESCs. Such an approach not only
enables deeper proteome coverage but, in conjunction with
GO analysis and orthogonal validation using techniques
such as fluorescence microscopy if necessary, can also
provide insight into protein localization. Our data lay the
foundation for quantitative analysis of changes in the
subcellular proteome of hESCs upon initiation of differ-
entiation. Interestingly, our analysis shows that two proteins
that are annotated as nuclear-only are indeed present in the
cytoplasm of undifferentiated hESCs. Our results also show
that proteomic analysis of ER and Golgi can provide a
powerful alternative to conventional approaches for the
characterization of the cellular secretome. Comprehensive
identification of the exogenous signaling factors present in
mouse embryonic fibroblast-CM coupled with the know-
ledge of endogenous signaling factors secreted by hESCs
themselves will enable us to fully characterize the micro-
environment of hESCs.
The authors gratefully acknowledge the funding from theNational Science Foundation grant CBET-0966859.
The authors have declared no conflict of interest.
5 References
[1] Thomson, J. A., Itskovitz-Eldor, J., Shapiro, S. S., Waknitz,
M. A. et al., Embryonic stem cell lines derived from human
blastocysts. Science 1998, 282, 1145–1147.
[2] Hughes, C. S., Nuhn, A. A., Postovit, L. M., Lajoie, G. A.,
Proteomics of human embryonic stem cells. Proteomics
2011, 11, 675–690.
[3] Van Hoof, D., Heck, A. J., Krijgsveld, J., Mummery, C. L.,
Proteomics and human embryonic stem cells. Stem Cell
Res. 2008, 1, 169–182.
[4] Lee, Y. H., Tan, H. T., Chung, M. C., Subcellular fractionation
methods and strategies for proteomics. Proteomics 2010,
10, 3935–3956.
[5] Harkness, L., Christiansen, H., Nehlin, J., Barington, T. et al.,
Identification of a membrane proteomic signature for
human embryonic stem cells independent of culture
conditions. Stem Cell Res. 2008, 1, 219–227.
[6] Dormeyer, W., Van Hoof, D., Braam, S. R., Heck, A. J. et al.,
Plasma membrane proteomics of human embryonic stem
cells and human embryonal carcinoma cells. J. Proteome
Res. 2008, 7, 2936–2951.
[7] Gerwe, B. A., Angel, P. M., West, F. D., Hasneen, K. et al.,
Membrane proteomic signatures of karyotypically normal
and abnormal human embryonic stem cell lines and deri-
vatives. Proteomics 2011, 11, 2515–2527.
[8] Prokhorova, T. A., Rigbolt, K. T., Johansen, P. T., Henning-
sen, J. et al., Stable isotope labeling by amino acids in cell
culture (SILAC) and quantitative comparison of the
membrane proteomes of self-renewing and differentiating
human embryonic stem cells. Mol. Cell. Proteomics 2009, 8,
959–970.
[9] Van Hoof, D., Dormeyer, W., Braam, S. R., Passier, R. et al.,
Identification of cell surface proteins for antibody-based
selection of human embryonic stem cell-derived cardio-
myocytes. J. Proteome Res. 2010, 9, 1610–1618.
Proteomics 2012, 12, 421–430 429
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
[10] Barthelery, M., Salli, U., Vrana, K. E., Enhanced nuclear
proteomics. Proteomics 2008, 8, 1832–1838.
[11] Pewsey, E., Bruce, C., Tonge, P., Evans, C. et al., Nuclear
proteome dynamics in differentiating embryonic carcinoma
(NTERA-2) cells. J. Proteome Res. 2010, 9, 3412–3426.
[12] Collier, T. S., Randall, S. M., Sarkar, P., Rao, B. M. et al.,
Comparison of stable-isotope labeling with amino acids in
cell culture and spectral counting for relative quantification
of protein expression. Rapid Commun. Mass Spectrom.
2011, 25, 2524–2532.
[13] Collier, T. S., Sarkar, P., Rao, B., Muddiman, D. C., Quanti-
tative top-down proteomics of SILAC labeled human
embryonic stem cells. J. Am. Soc. Mass Spectrom. 2010, 21,
879–889.
[14] Conesa, A., Gotz, S., Garcia-Gomez, J. M., Terol, J. et al.,
Blast2GO: A universal tool for annotation, visualization and
analysis in functional genomics research. Bioinformatics
(Oxford, England) 2005, 21, 3674–3676.
[15] Huang da, W., Sherman, B. T., Lempicki, R. A., Systematic
and integrative analysis of large gene lists using DAVID
bioinformatics resources. Nat. Protoc. 2009, 4, 44–57.
[16] Liu, H., Sadygov, R. G., Yates, J. R., 3rd, A model for
random sampling and estimation of relative protein abun-
dance in shotgun proteomics. Anal. Chem. 2004, 76,
4193–4201.
[17] Zybailov, B., Mosley, A. L., Sardiu, M. E., Coleman, M. K.
et al., Statistical analysis of membrane proteome expres-
sion changes in Saccharomyces cerevisiae. J. Proteome
Res. 2006, 5, 2339–2347.
[18] Sardiu, M. E., Cai, Y., Jin, J., Swanson, S. K. et al., Prob-
abilistic assembly of human protein interaction networks
from label-free quantitative proteomics. Proc. Natl. Acad.
Sci. USA 2008, 105, 1454–1459.
[19] Kopytova, D. V., Orlova, A. V., Krasnov, A. N., Gurskiy, D. Y.
et al., Multifunctional factor ENY2 is associated with the
THO complex and promotes its recruitment onto nascent
mRNA. Genes Dev. 2010, 24, 86–96.
[20] Shi, D., Pop, M. S., Kulikov, R., Love, I. M. et al., CBP and
p300 are cytoplasmic E4 polyubiquitin ligases for p53. Proc.
Natl. Acad. Sci. USA 2009, 106, 16275–16280.
[21] Sato, N., Meijer, L., Skaltsounis, L., Greengard, P., Brivan-
lou, A. H., Maintenance of pluripotency in human and
mouse embryonic stem cells through activation of Wnt
signaling by a pharmacological GSK-3-specific inhibitor.
Nat. Med. 2004, 10, 55–63.
[22] Funaba, M., Zimmerman, C. M., Mathews, L. S., Modulation
of Smad2-mediated signaling by extracellular signal-regu-
lated kinase. J. Biol. Chem. 2002, 277, 41361–41368.
[23] de Caestecker, M. P., Parks, W. T., Frank, C. J., Castagnino,
P. et al., Smad2 transduces common signals from receptor
serine-threonine and tyrosine kinases. Genes Dev. 1998, 12,
1587–1592.
[24] Kretzschmar, M., Doody, J., Timokhina, I., Massague, J., A
mechanism of repression of TGFbeta/Smad signaling by
oncogenic Ras. Genes Dev. 1999, 13, 804–816.
[25] Lin, X., Duan, X., Liang, Y. Y., Su, Y. et al., PPM1A functions
as a Smad phosphatase to terminate TGFbeta signaling.
Cell 2006, 125, 915–928.
[26] Matsuura, I., Denissova, N. G., Wang, G., He, D. et al.,
Cyclin-dependent kinases regulate the antiproliferative
function of Smads. Nature 2004, 430, 226–231.
[27] Kuratomi, G., Komuro, A., Goto, K., Shinozaki, M. et al.,
NEDD4-2 (neural precursor cell expressed, developmentally
down-regulated 4-2) negatively regulates TGF-beta (trans-
forming growth factor-beta) signalling by inducing ubiqui-
tin-mediated degradation of Smad2 and TGF-beta type I
receptor. Biochem. J. 2005, 386, 461–470.
[28] Peerani, R., Rao, B. M., Bauwens, C., Yin, T. et al., Niche-
mediated control of human embryonic stem cell self-
renewal and differentiation. EMBO J. 2007, 26, 4744–4755.
[29] Bendall, S. C., Hughes, C., Campbell, J. L., Stewart, M. H.
et al., An enhanced mass spectrometry approach reveals
human embryonic stem cell growth factors in culture. Mol.
Cell. Proteomics 2009, 8, 421–432.
[30] Graham, J. M., Rickwood, D., Subcellular Fractionation: A
Practical Approach, IRL Press at Oxford University Press,
Oxford, New York 1997.
430 P. Sarkar et al. Proteomics 2012, 12, 421–430
& 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com