update on plastid proteomics in higher plants; current state and future goals klaas j. van wijka
TRANSCRIPT
1
Update on Plastid Proteomics in higher plants; current state and future goals
Klaas J. van Wijka# and Sacha Baginskyb#
a Department of Plant Biology, Cornell University, Ithaca, NY 14853, USA.
bMartin-Luther-Universität Halle-Wittenberg, Institut für Biochemie, 06120 Halle, Germany
#for correspondence: Klaas J. van Wijk, Dept. of Plant Biology, Emerson Hall 332, Cornell University,
Ithaca, NY 14853. Tel: 1-607-255-3664; Fax: 1-607-255-3664; [email protected]; Sacha Baginsky,
Martin-Luther-University Halle-Wittenberg, Institut für Biochemie, Abteilung Pflanzenbiochemie,
Weinbergweg 22 (Biozentrum), 06120 Halle (Saale), Tel: +49 345 55 25470; Fax: +49 345 55 27012;
Plant Physiology Preview. Published on February 24, 2011, as DOI:10.1104/pp.111.172932
Copyright 2011 by the American Society of Plant Biologists
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
2
1. Significance of Plastids in Plant Biology
Plastids are plant cell organelles with many essential functions in plant metabolism. Among these are
photosynthesis, amino acid and fatty acid biosynthesis, as well as the synthesis of several secondary
metabolites. All plastids originate from undifferentiated proplastids which are restricted to meristematic
tissues and undifferentiated cells. Depending on the tissue, proplastids can develop into different plastid
types, e.g. amyloplasts in storage tissue, chloroplasts in photosynthetic tissues and chromoplasts in fruits
and flowers. Other specialized plastid types include gerontoplasts, the plastids of senescent leaves that are
important for resource allocation, oleoplasts which are oil storage plastids in e.g. olive, and etioplasts, the
final stage of proplastid development in photosynthetic tissues in the dark (Wise, 2006). Finally, plastid
types can possibly specialize to different degrees depending on cell-type, developmental state and
(a)biotic conditions. An extreme case are the highly specialized C4 chloroplasts in bundle-sheath and
mesophyll cells in the maize leaf with strong differences in proteome composition (Friso et al., 2010).
With the ability to develop and differentiate, plastids add versatile biosynthetic capacity to the plant cell
and are responsible for unique biosynthetic pathways that make plants unrivaled biochemical factories
that are essential for life on earth. Thus, significant research efforts are underway that aim at
understanding plastid biology in depth. A decade ago, the first plastid proteomics study was published
and the potential of plastid proteomics was outlined (van Wijk, 2000). Since then, proteomics of plastids
and plant (sub)proteomes has delivered on its promise. Here, we will provide an update on the current
status of plastid proteome research, a decade after the first reports.
2. Advances in plastid proteomics and proteomics technology
Plant and plastid proteomics are now well established scientific disciplines with many laboratories
contributing to their progress. Not surprisingly, a significant fraction of the plastid proteome is
characterized today, and available information includes protein quantities, protein interactions and
posttranslational modifications (PTMs), as will be briefly highlighted in this update. However several of
the challenges for plastid proteomics outlined 10 years ago, still exist today, including the detection of
low abundance proteins (e.g. >10,000 fold lower than Rubisco), and capturing the dynamics of plastid
proteomes.
Much of the progress has been driven by technology development and improved genomics
resources. The main difference between proteomic technologies today and 10 years ago is the much
improved sensitivity (routinely at 1-50 fmol), the accelerated duty cycle (now MS/MS scans within a few
hundred msec), the improved mass accuracy (down to a few ppm for peptides) and the increased
resolution (up to 100,000) of the latest generation mass spectrometers. Furthermore coupling of nano-LC
with MS/MS is now routine and split-free nano-LC systems now deliver low flow-rates for nanospray
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
3
ionization with excellent reproducibility. Also important are improved software tools for the reliable
identification of peptides based on MS/MS spectra along with statistically sound estimates of false
discovery rates in large datasets. With the maturation of proteomics workflows, quantitative information
for plastid proteins became available. These new technologies, in combination with the availability of
multiple sequenced plant genomes now allow for answering more comprehensive and sophisticated
questions as compared to a decade ago (Baginsky, 2009; Gstaiger and Aebersold, 2009; Schulze and
Usadel, 2010; Walther and Mann, 2010).
In this update we will review progress on plastid proteomics and lay out a series of challenges
that can be addressed within the next few years. Table 1 provides an overview with web-based plant
proteomics resources that are relevant for plastid proteomics. We will center this update around the
concept of a plastid protein atlas. Box I provides examples of the wide range of queries that a high quality
plastid protein atlas should ultimately be able to answer. For more exhaustive overviews on proteomics of
plastids, we refer to two recent review articles and references therein (Baginsky, 2009; Agrawal et al.,
2010); space restrictions do not allow us to cite recent literature more extensively.
3. Development of a plastid protein atlas
The concept of a protein atlas was established several years ago in particular for the human proteome
(http://www.proteinatlas.org/). This concept involves the generation of protein inventories for each organ
and subcellular localization tagged with additional protein information, such as splice variants, PTMs,
protein-protein interactions, etc. Similar efforts are underway to collect all available information for
plants to generate a plant proteome atlas that includes proteome information for the different plant organs
and their organelles. Here we concentrate on the plastid proteome atlas. In order to disseminate
biologically useful information, such a plastid atlas should include: i) protein accessions for each plastid
type, including cellular specialization and subplastid localization, ii) information on peptide coverage of
each identified protein and possibly different gene models, iii) steady state protein abundance under a set
of well-defined (a)biotic conditions as well as developmental states, iv) protein-protein interactions,
protein-nucleotide assemblies and oligomeric state, v) reversible and irreversible PTMs, and vi)
bioinformatics information such as subcellular localization predictions and network information.
Plastids are among the best characterized cell organelles at the proteome level and a quality
chloroplast protein atlas is now emerging. However, the plastid protein atlas is far from complete and
strategies to improve proteome coverage and in-depth characterization must be developed and
implemented. In the following paragraphs we will review the status of each of the six components of the
plastid protein atlas and outline strategies for improvement.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
4
3.1 Improved protein inventories for each plastid type, including cellular specialization and
subplastid localization
The predicted size of the combined proteome of all plastid types ranges from 2000-3500 proteins in
Arabidopsis thaliana, representing about 7-12% of all predicted protein-encoding genes. However, only
about 1200 proteins are currently recognized as being plastid-localized (see the Plant Proteome Database
PPDB at http://ppdb.tc.cornell.edu). Comparing this experimental plastid proteome data set with the
predicted plastid proteome showed that in particular plastid proteins involved in signaling and plastid
gene expression and RNA metabolism are strongly underrepresented. There are several reasons why a
significant percentage of plastid proteins has not yet been recognized: 1) low abundance in chloroplasts,
i.e. their detection is obscured by highly abundant photosynthetic proteins, 2) specific expression in a
certain plastid type other than chloroplast, 3) only expressed under very specific conditions
(developmental state, abiotic condition or biotic challenge), or 4) too few ionisable tryptic peptides (e.g.
transmembrane proteins with very short loops and tails or very small or basic proteins). Plastid proteome
coverage can be improved by using better mass spectrometry instrumentation with higher sensitivity,
accuracy and faster duty cycle, use of alternative enzymes for protein digestion, more specific (e.g.
affinity-based) fractionation of plastid proteomes or increased efforts to analyze a more diverse set of
plastid types, including heterotrophic plastids. However, as analytical sensitivity increases with these
additional efforts the challenge to distinguish between true positive and false positive plastid proteins
increases as well.
Based on the last decade of plastid proteome research, it is clear that objective filtering strategies
for false positive identification and/or assignment to plastids are essential. The most practical solution
involves repeated analysis of independent plastid preparations and the use of quantitative protein
information for improved filtering of the identified proteins, based on two steps: i) repeat observations in
independent plastid preparations – proteins that are observed at high frequency across these preparations
are more likely to be bona fide plastid proteins, ii) combined proteome information from unfractionated
tissue and different purified organelles to recognize false positives that more highly accumulate in other
subcellular locations – this requires quantitative information about relative protein abundance. Such
relative quantification for these different samples types should ideally be done with the same
experimental workflow and a good example is available from the LOPIT technique that uses relative
protein quantification along density gradients to assign proteins to organelles by association (Dunkley et
al., 2006). The ‘frequency’ filter (i.) is based on the assumption that non-plastid contaminants or false
positive identifications are random events; therefore this first filter does not remove systematic false
positives, such has high abundant cytosolic proteins which can contaminate isolated chloroplasts. A small
percentage of plastid proteins are also located elsewhere in the cell and ~50 dual targeted proteins have
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
5
been discovered for Arabidopsis so far (Carrie et al., 2009). Most of these have shared locations with
mitochondria and are involved in plastid or mitochondrial gene expression (e.g. t-RNA synthetases),
however shared localizations with the nucleus, peroxisome or the cytosol have also been described.
Detection of such dual locations requires independent information, typically from image analysis using
fluorescent fusion proteins and ideally also from phenotypical analysis of mutants.
Collection of all available protein localization data from individual functional studies, as well as
proteomics studies, is an important tool in conclusive assignment of proteins to the plastid. For instance it
helps to recognize abundant proteins often identified in dozens of proteomics papers as potential
contaminants. The SUBA database (http://suba.plantenergy.uwa.edu.au/) collects information for
Arabidopsis that is available about the localization of a certain protein, e.g. MS/MS data, GFP-
localization and prediction tools, and allows assembling lists of organellar proteins with self-defined
reliability criteria. The PPDB accumulates similar information for Arabidopsis, as well as maize , and
combines it with in-house MS/MS based quantitative information on total leaf extracts and isolated
plastid (fractions) (stored in PPDB) to manually evaluate this information and make a manual assignment
for subcellular localization. This manual curation step using a conservative threshold (i.e. no call is made
unless there is deemed sufficient evidence) has proven to result in high confidence localization calls as
judged by comparisons with subsequent independent experimental localization studies by GFP fusions
and image analysis.
Another way to help completing the plastid proteome inventory is to analyze plastid types
specialized for specific tasks in their resident tissue (organ or cell type) because they differ considerably
in their protein composition. However, this is challenging for Arabidopsis since its seeds and flowers are
small and it does not develop storage organs. Thus, organelle isolation is often impracticable and
proteome analyses are better performed at the level of the entire organ as illustrated by the analysis of
plastids in seeds (Chen et al., 2009). Several groups tried to circumvent this problem by using different
plant species, e.g. tobacco, bell pepper, spinach, pea, wheat, potato, tomato or Brassica rapa for the
analysis of amyloplasts, chromoplast, proplastids and leucoplasts (Agrawal et al., 2010). However, so far
this has not significantly increased the number of identified plastid proteins, in part due to the lack of
complete genome sequence information. Exceptions are rice and maize because good quality genome
annotation is available for these two organisms and the coverage of the plastid proteome of maize is now
quite comparable to Arabidopsis in part because cell-type specific chloroplasts, specialized for specific
functions, were included (Friso et al., 2010). Importantly, this allowed identification of C4-specific
metabolic chloroplast envelope transporters and also helped identify many new subunits of the elusive
thylakoid NADPH dehydrogenase complex involved in cyclic electron flow (Brautigam et al., 2008;
Majeran et al., 2008).
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
6
The spatial distribution of proteins within chloroplasts has been the target of several proteome
analyses, originally starting with the thylakoid lumen and peripheral soluble thylakoid proteins (Peltier et
al., 2000), followed by systematic analyses of the thylakoid and envelope membrane proteomes, the
soluble stroma proteome, specialized thylakoid-associated lipoprotein particles, assigned plastoglobules
and proteins associated with the plastid chromosome (Baginsky et al., 2009, Agrawal et al., 2010). A
recent study separated the Arabidopsis chloroplast proteome into soluble proteins and thylakoid and
envelope membrane proteins (Ferro et al., 2010). Protein localization to each subcompartment was based
on the abundance distribution of identified proteins in different purified fractions. Information about the
protein composition of the chloroplast sub-compartments is available in PPDB and AT_CHLORO
(http://www.grenoble.prabi.fr/at_chloro/). Because of space constraints in this update, we refer the reader
to the most recent and comprehensive review with extensive literature citations (Agrawal et al., 2010)
instead of discussing the original literature in this report.
3.2. Discovery and significance of gene models
Many genes have more than one annotated gene model, in some cases the different models only affect
untranslated 5’ and 3’ ends, whereas in others this affects the actual translated region. This is achieved by
different transcription start sites or by alternative splicing (AS). AS has received considerable attention at
the transcript level, in particular since new generation sequencing techniques now allow for large scale
detection of alternative splicing. At least 20% of plant genes have one or more alternative transcript
isoform. The majority of these AS events have not been functionally characterized, but evidence suggests
that AS participates in important plant functions, including stress response, and may impact domestication
and trait selection. Alternative transcription start sites or AS can result in proteins with different N-or C-
termini or internal protein regions, potentially affecting subcellular localization and functions. Indeed, one
of the mechanisms for dual targeting is that two different proteins that differ in their N-terminus are
generated from a single gene (Peeters and Small, 2001). Matching mass spectrometry data to these
different gene models can help to identify the most relevant predicted protein forms. In the PPDB (for
Arabidopsis and maize) and AtProteome (http://www.pep2pro.ethz.ch), peptide identification data are
projected on each gene model, allowing evaluation of the most relevant models. However, a systematic
analysis of the consequences of AS at the plant proteome level has not been carried out; this is not
surprising given the challenges associated with obtaining nearly complete sequence coverage (i.e. the %
of primary amino acid sequence for which peptides are detected) that is required to distinguish different
gene models.
In case of mass spectrometry-based quantitative proteomics, decisions have to be made how to
handle protein models. For instance, one model may have more matched peptides than another model.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
7
One solution is to select only the information for the highest scoring model or alternatively collect and
sum all matched peptides for all protein models of a gene. In practice this may not effect most
quantifications, but it is important to systematically implement a chosen procedure. The van Wijk lab
consistently selected the higher scoring protein model (calculated across all samples for the specific
analysis) and if there was no difference in protein score between models, the model with the lowest digit
was selected (see e.g. (Friso et al., 2010)). Other labs sum up all spectral counts for a gene and remove the
model information (Baerenfaller et al., 2008). Either method has its merits and it is important that the
applied procedure is transparent.
3.3. Protein abundance within the plastid
The range of protein accumulation levels in plant organs and within the plastid likely spans up to ~10
orders of magnitude. Using 1-DE gel separation, followed by in-gel digestion and the latest generation of
tandem mass spectrometers for un-targeted (‘shotgun’) analysis with data-dependent acquisition (DDA),
proteins are typically identified within an abundance range of 5 to maximally 6 orders of magnitude.
Mapping plastid protein abundance is important to understand the composition of protein complexes,
functionalities of plastid membranes and plastid particles such as plastoglobules or nucleoids, as well as
understanding plastid metabolism and consideration of metabolic flux. In addition, as discussed in the
previous session, relative protein abundance measurements are also an important tool to evaluate if
proteins are indeed plastid localized. When discussing protein quantification, we must distinguish
between i) measuring protein mass or protein concentration within a sample, and ii) comparing relative
protein concentrations (or mass) of the same protein between different samples. The latter case is often
referred to as measuring differential protein expression or ‘functional proteomics’, e.g. when studying the
effect of (a)biotic stress, developmental processes or mutants. Most (plant) protein quantification studies
relate to differential expression (functional proteomics). In the current section, we will discuss the first
case, whereas the second case is briefly discussed in section 4 (Employing the plastid proteome atlas for
functional analysis).
The two strategies that have so far been employed to map protein abundance within the plastid
are: i) image analysis of stained two-dimensional gels and ii) mass spectrometry-based quantification
using spectral counting. Quantification using 2D gel electrophoresis with IEF as the first dimension was
used in most gel based studies, e.g. for the thylakoid lumen (Schubert et al., 2002) or soluble proteins in
rice etioplasts (Kleffmann et al., 2007); however in most other studies this was applied to ‘functional
proteomics’. 2DE gels with native gel electrophoresis as the first dimension was used to determine a
quantitative map of soluble chloroplast proteins and their oligomeric states in the stroma of Arabidopsis
thaliana (Peltier et al., 2006). In a subsequent study, Arabidopsis stromal proteins were quantified using
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
8
mass spectrometry based spectral counting (Zybailov et al., 2008). Both complementary procedures were
also carried out for chloroplast membranes and stromal fractions of isolated bundle sheath and mesophyll
cells of maize leaves (Majeran et al., 2008). The advantage of IEF based 2D gels lies mostly in the higher
resolution of IEF compared to native gels; however IEF gels systematically lead to (often strong)
underestimation of higher molecular mass proteins and hydrophobic proteins, whereas proteins with
extreme pI (<4 or >10) are harder to resolve. For the mapping of absolute protein abundances including
membrane proteins, colorless native or blue native gels are thus the better alternatives.
Directly comparing image and MS-based methodologies showed that image-based quantification
is very limited in the number of proteins that can be accurately quantified because protein spots need to be
fully separated from other spots to avoid quantifying protein mixtures. Furthermore, the quantification is
significantly affected by the amino acid composition, because current dyes bind in particular to basic
residues, leading to over- or underestimation of proteins, depending on the amino acid composition. MS-
based quantification allows for quantification of a much larger number of proteins, typically resulting in a
higher dynamic range. However highly abundant proteins (e.g. the ~10-20 most abundant proteins in a
sample) are often underestimated because of the necessary use of data-dependent acquisition (DDA) (see
Zybailov et al 2008 - for numbers), whereas proteins quantified with low numbers of MS/MS spectra can
show quite large sample to sample variation. In general, proteins are most accurately quantified if
multiple unique peptides are detected, each in high numbers. Conversion of protein mass quantification
(either by the image analysis or MS-based quantification) to protein concentration requires normalization
by either the number of predicted tryptic peptides within the relevant mass window (in case of MS-based
quantification) or by protein length or mass (for both image- and MS-based quantification). Despite the
advantages described above, 2D-PAGE has still a place in quantitative proteomics in particular for
analysis of protein complexes and because it provides an immediate visible overview of the proteome.
The ‘gold-standard’ for protein abundance measurements is to spike the sample with isotope
labeled proteins or proteotypic peptides, assigned as ‘isotope dilution’ (Brun et al., 2009). These peptides
can be generated by in vitro synthesis or by expression as a concatamer of proteotypic peptides after
construction of a synthetic gene, QconCAT. Both methods require significant investments and typically
are applied to smaller numbers of proteins – these techniques are therefore currently not practical for
quantification of hundreds of proteins and have so far been applied only to targeted analysis of selected
plastid pathways (Wienkoop et al., 2010). However, efforts are underway to establish QconCAT to
determine the stoichiometry of the Clp protease complex and for the quantification of specific plastid
(plant) metabolic pathways or plastid processes.
3.4. Protein-protein interactions, protein-nucleotide assemblies and oligomeric state
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
9
To carry out their metabolic, structural or signaling functions, many plastid proteins form transient or
stable interactions with other proteins. Few undirected systematic protein interaction studies have been
carried out for soluble stromal complexes, either by native gel electrophoresis (below 800 kDa) or by
chromatography (>800 kDa) (Olinares et al., 2010); these two complementary studies provide an
overview of the oligomeric state of >1000 proteins. In particular protein assemblies larger than 800 kDa
are dominated by functions in plastid gene expression including nucleoids, mRNA metabolism and
ribosomes. The interaction of plastid proteins with DNA or RNA constitutes a regulatory network of gene
expression. The largest structures of several megaDaltons are nucleoids also known as transcriptionally
active chromosome (TAC), which contains several copies of plastid DNA and dozens of DNA and RNA
binding proteins, including proteins likely regulating nucleoid activities through reduction/oxidation or
phosphorylation (Pfalz et al., 2006). Envelope-membrane protein complexes are dominated by the
translocon complexes at the inner and outer envelope membrane (TIC and TOC). These import
complexes are functionally relatively well characterized by a variety of techniques, including blue-native
gels (Kikuchi et al., 2009) and references therein. The abundant photosynthetic protein complexes in the
thylakoid membrane have been a target for biochemical research for several decades and are now well
characterized through a number of methodologies. Most proteins in these complexes have been identified
and characterized by mass spectrometry and for some of them PTMs have been determined by intact
protein mass spectrometry (Whitelegge, 2004).
More detailed protein-protein interaction studies, using either co-immunoprecipitation or affinity
purification using transgenic plants that express tagged transgenes, are needed to better characterize the
plastid proteome interactome. This will help to better understand in particular regulation of metabolism
and plastid gene expression and to build reliable protein interaction networks to complement the plastid
proteome atlas.
3.5. Reversible and irreversible PTMs
Most proteins undergo reversible and sometimes irreversible modifications. Large scale analysis of
PTMs, using a high resolution, high accuracy LTQ-Orbitrap mass spectrometer, was carried out for
chloroplast membranes and stroma, as well as total leaf extracts and the frequencies of many PTMs were
calculated (Zybailov et al., 2009). This analysis provides a framework for search parameters and the use
of retention times for improved assignment of PTMs in large-scale proteomics, and helps distinguishing
artificial modifications from those with a biological relevance. For nuclear-encoded plastid proteins, the
most typical irreversible in vivo modification is proteolytic cleavage of the N-terminal transit peptide, the
cTP. In case of most plastid-encoded proteins, typically the N-terminal methionine is removed by
methione amino peptidases, which have been identified in plastids. Another frequent N-terminal
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
10
modification that occurs after removal of N-terminal targeting information is N-terminal acetylation.
Because N-terminal acetylation requires in situ enzyme activity, it provides a reliable determination of the
N-terminus and thus valuable information about the processing site for transit peptides of imported
chloroplast proteins. Thus N-terminal acetylation allows mapping the in vivo N-termini of plastid and
cytosolic proteins. Kleffmann and colleagues established for a small set of proteins from rice etioplasts
the in vivo N-terminus and found that there is a good agreement between the detected N-terminal peptide
and the predicted processing peptidase cleavage site (Kleffmann et al., 2007). Similarly, Zybailov and
colleagues identified a larger set of N-terminal acetylated proteins in Arabidopsis chloroplasts and
provided additional context information for the processing protease cleavage site, also indicating that the
predictive cleavage site is one residue off from the actual cleavage site (Zybailov et al., 2008).
Improvements for cleavage site prediction should be possible based on the now available larger training
set.
PTMs often determine enzymatic activities and rapidly adjust enzyme activity to the requirements
of the cellular metabolism; protein abundance does likely correspond to maximal (theoretical) activity but
is not always a good indicator for in vivo enzyme activity and its net contribution to cell metabolism. It is
well established that reversible phosphorylation and reduction/oxidation, e.g. through the action of
different types of plastid thioredoxins (TRX), are key regulators of plastid metabolism, as well as plastid
gene expression (Dietz and Pfannschmidt, 2010). Several proteomics studies identified thioredoxin targets
by affinity chromatography, whereas other redox proteomics approaches used diagonal electrophoresis
under reducing and oxidizing conditions to identify proteins under redox control in vivo (Dietz and
Pfannschmidt, 2010). These analyses demonstrated that many chloroplast functions are regulated by
TRX-mediated disulphide/dithiol exchange, or by currently unknown redox modulators. Among these
functions are isoprenoid and tetrapyrrole biosynthesis, starch biosynthesis and degradation, gene
expression, protein folding and degradation, vitamin biosynthesis. Redox targets in the thylakoid lumen
were identified and inhibition of the activity of the xanthophyll cycle enzyme violaxanthin de-epoxidase
by reduction, i.e. dithiol generation was established (Dietz and Pfannschmidt, 2010).
Over the last few years, two thylakoid associated kinases (STN7, STN8), as well as a thylakoid
associated phosphatase (TAP38/PPH1), have been identified and their functions were investigated by
functional analysis of Arabidopsis mutants (Lemeille and Rochaix, 2010). The reversible phosphorylation
system at the thylakoid membrane regulates photosynthetic state transitions to optimize light absorption,
as well as long term light adaptation. 175 phosphorylated chloroplast proteins were identified, with 80%
serine and 20% threonine phosphorylation, but no tyrosine phosphorylation. One of the thylakoid kinases,
STN7, was found to be an abundant phosphoprotein in vivo suggesting the existence of kinase cascades in
the chloroplast. Information about the exact site of phosphorylation was used to extract kinase motifs
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
11
which are useful footprints for kinase activity in vivo (Reiland et al., 2009). Cumulative evidence for plant
proteome phosphorylations are collected in various databases, such as the PhosPhAt database for
Arabidopsis (http://phosphat.mpimp-golm.mpg.de/)
3.6. Subcellular localization predictions and network information
The distribution of cellular functions to distinct cell organelles is an important organization principle that
needs to be understood to model metabolic and protein interaction networks, to make predictions at the
systems scale. Thus, analyses of the protein composition of cell organelles were reported for virtually all
plant cell organelles or membranes (Baginsky, 2009; Agrawal et al., 2010). At present, plant modeling
and systems analysis approaches with subcellular organelles suffer from incomplete proteome
identification and annotation. More complete organelle inventories will strengthen modeling efforts and
higher network consistencies should be obtained. In order to make a contribution to model quality,
however, protein localization data should have low false positive rates, e.g. below 1%. Therefore,
conservative assignment of protein subcellular localization in papers and public databases is better than
over-assignment of proteins, in particular since it is not really possible to associate a p-value for
subcellular localization assignment based on experimental data. Thus, the community’s goal should be a
plastid proteome atlas with high sensitivity and a very low false positive rate.
In addition to the experimental organelle proteome analysis, subcellular localization prediction is
a possible source of information for ‘missing’ plastid proteins, even if suboptimal. The generation of
software routines to predict subcellular protein localization for plants, other eukaryotes, as well as
prokaryotes, has been in progress for well over a decade, in particular inspired by the increasing amount
of protein inventories for different subcellular localizations. These inventories provide essential training
and test sets. Whereas the prediction of N-terminal signal peptides (SPs) for SRP-dependent targeting to
the endoplasmatic reticulum is rather accurate and sensitive, prediction of plastid localization is much less
satisfactory and still attracts considerable attention. A consensus prediction combining several predictors
using a naïve Bayes method was suggested to improve both sensitivity and specificity for plastid and
mitochondrial proteins (Schwacke et al., 2007). In the last 2-3 years several new localization predictors
(e.g. AtSubP, Subchlo, RSLpred, MultiP, Plant-mPLoc) were published for plants mostly focusing on
Arabidopsis. While each predictor may have advantages over the other one, it is not clear that their
prediction has a better true positive discovery rate for plastid proteins (i.e. a higher sensitivity) at a lower
false positive discovery rate (i.e. a better specificity) than the most popular predictor TargetP
(http://www.cbs.dtu.dk/services/TargetP/).
TargetP is still the most commonly used predictor for plastid, as well as plant mitochondrial
localization that not only predicts localization, but also the cTP and mTP cleavage sites. There is still
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
12
some controversy on the true positive prediction rate of TargetP that was found to differ between
experimental datasets. While plastid proteome studies from the van Wijk lab and others reported true
positive prediction rates in the range of 85% consistent with the benchmark tests obtained during TargetP
training, other groups found much lower prediction rates on their plastid protein set (Armbruster et al.,
2011). Higher TargetP true positive rates (sensitivity) are usually observed when proteins were eliminated
not repeatedly detected in plastid preparations, while also applying conservative thresholds for protein
identification (see also discussion under 3.1). Importantly, sets of detected low abundant Arabidopsis
proteins (several orders of magnitude lower than e.g. RBCL), e.g. those involved in RNA metabolism,
have similar true positive prediction rates as high abundant proteins (Olinares et al., 2010). However,
proteins located in the outer plastid envelop membrane or those reversibly associated with the outer
envelope should be excluded from such prediction analysis because they do not possess an cleavable N-
terminal plastid targeting sequence. The main shortcoming of TargetP is the high false positive rate (low
accuracy), likely around 35%, leading to an overprediction for plastid proteins.
The current sensitivity and accuracy of TargetP is clearly not perfect and the much larger sets of
established subcellular proteomes for Arabidopsis (and to a lesser degree also maize and rice) should be
useful to improve the performance of plastid localization predictors. In addition, it is quite likely that a
subset of nuclear-encoded plastid proteins have atypical targeting information. For instance it has been
shown for a few plastid proteins that they are targeted to the plastid via the ER, the N-terminus of these
precursor proteins contains a secretory signal peptide (SP), followed by a cTP (Villarejo et al., 2005).
However, scanning for SPs of ~1000 established plastid proteins in Arabidopsis suggested that probably
very few proteins take this route (Zybailov et al., 2008). However, it is possible that there is yet another
pathway (or recognition system) for protein translocation across the envelope that account for the
imperfect true positive rate; the recent finding of an envelope-localized SEC system may be relevant here
(Skalitzky et al., 2011). Finally, it may be optimal to develop and test localization software for specific
species, plant families or even clades. For instance, monotyledons such as rice, sorghum and maize may
have systematically different protein targeting information as compared to dicotyledons such as
Arabidopsis, tobacco, pea and spinach. Indeed, systematic analyses of established rice plastid proteins as
well as rice orthologs for Arabidopsis chloroplast proteins showed that alanine instead of serine or
threonine is overrepresented in the cTP (Kleffmann et al., 2007; Zybailov et al., 2008).
With detailed information about the enzymatic inventory of organelles, their specific contribution
to metabolism and signaling is also accessible to large-scale modeling approaches. Genome-scale
metabolic networks for the C3 and C4 plants, respectively Arabidopsis thaliana and maize, as well as the
green algae Chlamydomonas reinhardtii, were constructed that take into account compartmentalization
and allow assessing the specific contribution of cell organelles to metabolism (Dal'Molin et al., 2010).
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
13
Large-scale protein/protein interaction networks also benefit significantly from knowledge about the co-
localization of proteins in the same organelle. This information decreases false discovery rates in large-
scale interaction datasets for Arabidopsis, thereby increasing the reliability of predicted interaction
networks. Progress has been made for the assembly of plant organellar phosphorylation networks, and for
chloroplasts in particular the (de)phosphorylation driven movement of light harvesting complexes in the
thylakoid membrane (assigned state transitions) (Lemeille and Rochaix, 2010). Studies in non-plant
species have shown that using phosphoproteomics information, it is possible to infer in vivo kinase
activities from phosphorylation motifs to provide information about kinase/substrate relationships, and
together with localization information, construct in vivo phosphorylation networks. Thus, protein
inventories of cell organelles are important constraints in constructing signal transduction networks. Last
but not least, publicly available and reliable protein subcellular localization will be helpful and cost-
effective in the functional analysis of genes and proteins, as the need to determine the localization for
each protein is fulfilled.
4. Employing the plastid proteome atlas for functional analysis and systems biology
Even if the plastid protein atlas is not complete, it does provide a rich source of information and a great
tool for detailed functional studies. Table 1 lists the available proteomics resources with relevance to
plastid biology and the Box provides a number of example questions that can be addressed with the
available tools. Now that subcellular localization of many proteins is known, it is possible to analyze the
qualitative and quantitative effects of mutations of specific organelles without actually purifying these
organelles. For instance, quantitative comparative proteome analysis of chloroplasts from wild-type and
different chloroplast Clp protease mutants was done using mass spectrometry-based quantification of total
Arabidopsis leaf extracts without actually isolating chloroplasts (Kim et al., 2009). The advantages of
characterizing quantitative effects on the chloroplast proteome through analysis of total leaf extracts,
rather than through analysis of isolated chloroplasts, are that: (i) mutants with strong growth defects can
be analyzed; isolation of chloroplast from such mutants can be very hard or even practically impossible;
(ii) more accurate results are obtained for chloroplast mutants with heterogeneity in their leaf phenotype
(often with strongest phenotypes in the youngest leaves); isolation of chloroplasts from such leaves could
result in selection of a subset of chloroplast phenotypes, not representing the overall chloroplast
population. Furthermore, such subcellular proteome information for maize, allowed to help resolve the
kinetics of the organelle biogenesis, formation of cellular structures and metabolism during maize leaf
development and C4 cellular differentiation (Majeran et al., 2010). The current generation of mass
spectrometers have sufficient sensitivity and throughput to detect and quantify a high number of
chloroplast proteins even in complex mixtures. Furthermore such a ‘total leaf’ approach can be helpful for
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
14
analyses of dynamic PTM that prevent lengthy organelle isolation procedures (Reiland et al., 2009), in
particular if no inhibitors can be applied to prevent change in such PTMs. With a plastid protein atlas for
Arabidopsis and maize at hand, it can be expected that large scale comparisons of chloroplast proteomes,
their PTMs and interaction networks under different conditions and in different genetics backgrounds or
developmental states will provide novel insight in plastid biology.
5. Deposition of proteomics and mass spectrometry information in pubic repositories
Most published plastid proteomics studies on Arabidopsis provide tables containing lists of the identified
proteins using standardized, non-redundant accession numbers provided through TAIR. For other plant
species this is more varied either because there is no sequenced genome or significant sized EST available
or because databases are searched such as NCBI that contain redundant sets of accessions (e.g. older and
newer version of genes); this can complicate incorporation of such data sets by other laboratories.
However, submission of the underlying mass spectra with associated metadata to public repositories such
as the Proteomics Identifications Database, PRIDE (http://www.ebi.ac.uk/pride), will allow other
laboratories to make use of these studies. And even for Arabidopsis and other new model (crop) species
such as maize and rice, it is important that the mass spectral data are deposited, for instance to help
improve search engines, improve genome annotation or allow for comparative analysis by other
laboratories. Indeed, several journals (e.g. Molecular and Cellular Proteomics, Nature Biotechnology)
now require submission of mass spectral data to such public repositories, similar as is customary for
microarray data or RNAseq data sets. Further more detailed descriptions of experimental conditions and
acquisition parameters are outlined in the MIAPE (Minimum Information About a Proteomics
Experiment) descriptions and enforced by several journals. We strongly support following these standards
and deposition of mass spectral data (e.g. converted MGF files) into PRIDE or other repositories.
6. CONCLUSIONS
Proteomics of chloroplasts and other plastid types has provided extensive protein inventories, as well as
information about PTMs, protein abundances and protein interactions. Proteomics and mass spectrometry
technologies feeding into plastid proteome information now allows system level analysis of chloroplast
biology, including chloroplast development, signaling and interaction networks. For reasons detailed
above, we consider a high quality plastid proteome atlas a milestone in the quest for biologically
meaningful systems biology approaches. Together with parallel efforts for other organelles (e.g.
mitochondria and peroxisomes) this will help to drive a better understanding of plant growth and
development and help realize the potential of plant systems biology.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
15
ACKNOWLEDGEMENTS
We would like to thank the members of our labs for discussions and feedback on this manuscript
Furthermore, we sincerely apologize to all colleagues whose work could not be cited because of space
constraints. Plant and plastid proteome-related research in the van Wijk lab is currently supported by
National Science Foundation grants MCB-1021963, IOS-0701736 and IOS-0922560. S. Baginsky´s lab is
currently supported by SNF grant 31003A_127202 and the Martin-Luther-University Halle-Wittenberg.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
16
Database species main purpose/objectivein-house experimental
plant materialtype of in-house experimental
information
functional annotation (name
and function)
subcellular localization
predictors
AtProteomeb AtExperimental information about identified proteotypic peptides, protein abundance in
different organs and spectral/peptide evidence for gene models different organs of At
in-house MS based identification. Peptides, ion scores, ppms,
Mowse scores, Meta informationTAIR
none, but detailed organ information
none
PPDBc At, Zm, Os
Curated information for all proteins and protein models in At and Zm, incl. protein information and functional annotation. Experimental information about leaf and
subcellular fractions with MS-based identification details incl. spectral counts and PTMs. M
Zm and At leaves and chloroplast fractions (eg
stroma, thylakoids, lumen, plastoglobules, etc). For Zm also BSC
and MC specific chloroplasts to study C4
effects. For At, also different mutant
background in Col-0.
in-house MS based identification. Peptides, ion scores, ppms,
Mowse scores, Meta information
Manual annotation of name and
functions (MapMan system); news functional bins
created as needed.
Manual assignment to subcellular localization where possible. Based
on in-house experimental
proteomics daat, GFP/YFP studies,
external proteomics studies (only
qualitative), functional papers
TargetP, TM-HMM,
LumenP
At-Chlorod AtIn-house analysis of At chloroplast proteome and its substructures (envelope, stroma,
thylakoid) with detailed proteomic information (peptides, MW, retention times, identification statistics).
chloroplast fractions (envelope, stroma and
thylakoid)
in-house MS based identification. Peptides, ion scores, ppms,
Mowse scores, Meta information
manual curation of name, functional
annotation by MapMan
chloroplast stroma, thylakoid, envelope
none
Plprote At, Os, Nt, Ca
Proteome analyses of rice etioplasts, At chloroplasts, pepper chromoplasts and the undifferentiated proplastid-like organelles of tobacco BY2 cells, plastid type-specific
functions.
different plastid types from various species
peptide identifications, homologues identified in other
plastid types, interactive 2D PAGE from differently illuminated
etioplasts
noneidentifications from
isolated plastidsnone
SUBAf AtFacilitate subcellular protein localization analysis based on different public prediction
tools, proteomics papers and GFP/YFP localization studies. Allow combinatorial queries on the contained data.
none; literature only NAlinks to TAIR, AmiGO and
UniPROT
users can employ various queries to generate answers
multiple localization predictors
PhosPhatg AtPhosphoproteome information from published and unpublished sources, identified
peptides or ions with annotated phosphorylation site (where available). Provides a P-site prediction tool.
At phosphoproteome at different conditions
Specific information about peptide properties, annotated biological function as well as the analytical
context; provides the phosphopeptide spectrum
TAIR none
Plant specific P-predictor
(pSer, pThr, pTyr).
RIPP-DBh At, OsPlant Phosphoproteome DB with information on phosphopeptides by LC-MS/MS-
based shotgun phosphoproteomicsrice and Ath cell cultures described in associated papers
hyperlinks to other databases
none none
ProMexi At, Cr, Mt, St
Mass spectral reference database of tryptic peptides from plant proteomes Different sourcesdisplay of MS/MS spectra with
annotationnone none none
#a http://gator.masc-proteomics.org/c http://ppdb.tc.cornell.edu d http://www.grenoble.prabi.fr/at_chloro/b http://fgcz-atproteome.unizh.che http://www.plprot.ethz.chg http://phosphat.mpimp-golm.mpg.de/i http://promex.pph.univie.ac.at/promexh https://database.riken.jp/sw/links/en/ria102i/f http://suba.plantenergy.uwa.edu.au/
abbreviations for species: At - Arabidopsis thaliana ; Os -Oryza sativa ; Zm - Zea mays ; Mt - Medicago truncatula ; St - Solanum tuberosum ; and the green algae Cr - Chlamydomonas reinhardii
Table 1. Plant and plastid proteomics databases that provide information and tools for finding plastid proteins in Arabidopsis and other plant species, as well as functional annotation, post-translational modifications, peptide information and spectral da
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
17
LITERATURE CITED
Agrawal GK, Bourguignon J, Rolland N, Ephritikhine G, Ferro M, Jaquinod M, Alexiou KG,
Chardot T, Chakraborty N, Jolivet P, Doonan JH, Rakwal R (2010) Plant organelle
proteomics: Collaborating for optimal cell function. Mass Spectrom Rev: on-line prepublication
Oct 29. [Epub ahead of print]
Armbruster U, Pesaresi P, Pribil M, Hertle A, Leister D (2011) Update on Chloroplast Research:
New Tools, New Topics, and New Trends. Mol Plant 4: 1-16.
Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S,
Zimmermann P, Grossniklaus U, Gruissem W, Baginsky S (2008) Genome-scale proteomics
reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320: 938-941.
Baginsky S (2009) Plant proteomics: concepts, applications, and novel strategies for data interpretation.
Mass Spectrom Rev 28: 93-120.
Brautigam A, Hofmann-Benning S, Weber AP (2008) Comparative proteomics of chloroplast
envelopes from C3 and C4 plants reveals specific adaptations of the plastid envelope to C4
photosynthesis and candidate proteins required for maintaining C4 metabolite fluxes. Plant
Physiol 148: 568-579.
Brun V, Masselon C, Garin J, Dupuis A (2009) Isotope dilution strategies for absolute quantitative
proteomics. J Proteomics 72: 740-749.
Carrie C, Giraud E, Whelan J (2009) Protein transport in organelles: Dual targeting of proteins to
mitochondria and chloroplasts. Febs J 276: 1187-1195.
Dal'Molin CG, Quek LE, Palfreyman RW, Brumbley SM, Nielsen LK (2010) C4GEM, a genome-
scale metabolic model to study C4 plant metabolism. Plant Physiol 154: 1871-1885.
Dietz KJ, Pfannschmidt T (2010) Novel regulators in photosynthetic redox control of plant metabolism
and gene expression. Plant Physiol prepublication on-line Dec 30
Dunkley TP, Hester S, Shadforth IP, Runions J, Weimar T, Hanton SL, Griffin JL, Bessant C,
Brandizzi F, Hawes C, Watson RB, Dupree P, Lilley KS (2006) Mapping the Arabidopsis
organelle proteome. Proc Natl Acad Sci U S A 103: 6518-6523.
Ferro M, Brugiere S, Salvi D, Seigneurin-Berny D, Court M, Moyet L, Ramus C, Miras S,
Mellal M, Le Gall S, Kieffer-Jaquinod S, Bruley C, Garin J, Joyard J, Masselon C,
Rolland N (2010) AT_CHLORO, a comprehensive chloroplast proteome database with
subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics 9:
1063-1084.
Friso G, Majeran W, Huang M, Sun Q, van Wijk KJ (2010) Reconstruction of metabolic pathways,
protein expression, and homeostasis machineries across maize bundle sheath and mesophyll
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
18
chloroplasts: large-scale quantitative proteomics using the first maize genome assembly. Plant
Physiol 152: 1219-1250.
Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics
and network biology. Nat Rev Genet 10: 617-627.
Kikuchi S, Oishi M, Hirabayashi Y, Lee DW, Hwang I, Nakai M (2009) A 1-megadalton
translocation complex containing Tic20 and Tic21 mediates chloroplast protein import at the
inner envelope membrane. Plant Cell 21: 1781-1797.
Kim J, Rudella A, Ramirez Rodriguez V, Zybailov B, Olinares PD, van Wijk KJ (2009) Subunits
of the Plastid ClpPR Protease Complex Have Differential Contributions to Embryogenesis,
Plastid Biogenesis, and Plant Development in Arabidopsis. Plant Cell 21: 1669-1692.
Kleffmann T, von Zychlinski A, Russenberger D, Hirsch-Hoffmann M, Gehrig P, Gruissem W,
Baginsky S (2007) Proteome dynamics during plastid differentiation in rice. Plant Physiol 143:
912-923.
Lemeille S, Rochaix JD (2010) State transitions at the crossroad of thylakoid signalling pathways.
Photosynth Res 106: 33-46.
Majeran W, Friso G, Ponnala L, Connolly B, Huang M, Reidel E, Zhang C, Asakura Y,
Bhuiyan NH, Sun Q, Turgeon R, van Wijk KJ (2010) Structural and metabolic transitions of
C4 leaf development and differentiation defined by microscopy and quantitative proteomics in
maize. Plant Cell 22: 3509-3542.
Majeran W, Zybailov B, Ytterberg AJ, Dunsmore J, Sun Q, van Wijk KJ (2008) Consequences of
C4 differentiation for chloroplast membrane proteomes in maize mesophyll and bundle sheath
cells. Mol Cell Proteomics 7: 1609-1638.
Olinares PD, Ponnola L, van Wijk KJ (2010) Megadalton complexes in the chloroplast stroma of
arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry and
hierarchical clustering. Mol Cell Proteomics 9.7: 1594-1615.
Peeters N, Small I (2001) Dual targeting to mitochondria and chloroplasts. Biochim Biophys Acta 1541:
54-63.
Peltier JB, Cai Y, Sun Q, Zabrouskov V, Giacomelli L, Rudella A, Ytterberg AJ, Rutschow H,
van Wijk KJ (2006) The Oligomeric Stromal Proteome of Arabidopsis thaliana Chloroplasts.
Mol Cell Proteomics 5: 114-133.
Peltier JB, Friso G, Kalume DE, Roepstorff P, Nilsson F, Adamska I, van Wijk KJ (2000)
Proteomics of the Chloroplast. Systematic identification and targeting analysis of lumenal and
peripheral thylakoid proteins. Plant Cell 12: 319-342.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
19
Pfalz J, Liere K, Kandlbinder A, Dietz KJ, Oelmuller R (2006) pTAC2, -6, and -12 are components
of the transcriptionally active plastid chromosome that are required for plastid gene expression.
Plant Cell 18: 176-197.
Reiland S, Messerli G, Baerenfaller K, Gerrits B, Endler A, Grossmann J, Gruissem W,
Baginsky S (2009) Large-scale Arabidopsis phosphoproteome profiling reveals novel chloroplast
kinase substrates and phosphorylation networks. Plant Physiol 150: 889-903.
Schubert M, Petersson UA, Haas BJ, Funk C, Schröder WP, Kieselbach T (2002) Proteome map
of the chloroplast lumen of Arabidopsis thaliana. J Biol Chem 277: 8354-8365.
Schulze WX, Usadel B (2010) Quantitation in mass-spectrometry-based proteomics. Annu Rev Plant
Biol 61: 491-516.
Schwacke R, Fischer K, Ketelsen B, Krupinska K, Krause K (2007) Comparative survey of plastid
and mitochondrial targeting properties of transcription factors in Arabidopsis and rice. Mol Genet
Genomics 277: 631-646.
Skalitzky CA, Martin JR, Harwood JH, Beirne JJ, Adamczyk BJ, Heck GR, Cline K,
Fernandez DE (2011) Plastids contain a second sec translocase system with essential functions.
Plant Physiol 155: 354-369.
van Wijk KJ (2000) Proteomics of the chloroplast: experimentation and prediction. Trends Plant Sci 5:
420-425.
Villarejo A, Buren S, Larsson S, Dejardin A, Monne M, Rudhe C, Karlsson J, Jansson S,
Lerouge P, Rolland N, von Heijne G, Grebe M, Bako L, Samuelsson G (2005) Evidence
for a protein transported through the secretory pathway en route to the higher plant chloroplast.
Nat Cell Biol 7: 1224-1231.
Walther TC, Mann M (2010) Mass spectrometry-based proteomics in cell biology. J Cell Biol 190: 491-
500.
Whitelegge JP (2004) Mass spectrometry for high throughput quantitative proteomics in plant research:
lessons from thylakoid membranes. Plant Physiol Biochem 42: 919-927.
Wienkoop S, Weiss J, May P, Kempa S, Irgang S, Recuenco-Munoz L, Pietzke M, Schwemmer
T, Rupprecht J, Egelhofer V, Weckwerth W (2010) Targeted proteomics for Chlamydomonas
reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic
flux analyses. Mol Biosyst 6: 1018-1031.
Wise RR (2006) The diversity of plastid form and function. In RRaH Wise, J.K., ed, The structure and
function of plastids, Vol 23. Springer, Dordrecht
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
20
BOX I. Examples of the wide range of queries that a high quality plastid protein atlas should
ultimately be able to answer. Current answers for Arabidopsis proteins are provided with reference to
the databases and resources listed in Table 1. Information for maize and rice is available in a subset of the
databases (see Table 1). This BOX also serves to better identify lack of information and challenges for
plastid proteome research and resource development for the immediate future.
Query 1: Is a protein located in the plastid (chloroplast) in Arabidopsis and what is the experimental
evidence? Search with AGI accession number in TAIR, PPDB, AT_Chloro, plprot; alternatively set up a
query in SUBA and make your own judgment. In case of PPDB, experimental evidence provided are in-
house proteome experiments (eg leaves, chloroplast fractions), detection in public proteomics studies
(displayed) or original literature (displayed for some), and for AT_Chloro provided evidence are the
number of matched spectra in plastid preparations and information from TAIR and PPDB.
Query 2: Which proteins are located in the plastid (chloroplast), and what is the experimental evidence?
Go to PPDB, AT_Chloro or plprot and download lists (tables) with assigned plastid proteins; alternatively
set up a query in SUBA and design your own criteria. In case of AT_Chloro and PPDB, you can narrow
down your search to subchloroplast locations, detection in different plastid types can be assessed in
plprot.
Query 3: Which proteins are predicted to locate in the plastid (chloroplast) and what is the p-value and/or
FDR? Either go to the website of different subcellular localization predictors and extract the predicted list
of At plastid proteins if available. For TargetP prediction, you can also go to PPDB and extract these
accessions with other types of information, such as function. P-values are provided by some predictors
but may not be very reliable, whereas FDRs are mostly self-reported based on test sets. Cross check
prediction with available proteome information (see above).
Query 4: Does a plastid protein have PTMs, on which residue, is there more than 1 gene model and what
is the experimental evidence? Search PhosPhat or RIPP databases for phosphorylation sites and proteins.
For other PTMs, search PPDB either by accession, or alternatively, by PTM. Peptide-based support for
gene models are displayed in both AtProteome (At) and PPDB (At and Zm) for in-house detected
proteins.
Query 5: What is the relative abundance of a plastid protein and how does it change in response to
(a)biotic stress or developmental state or genetic background? This is currently still a difficult question
and answers are best obtained in the individual studies. However, there is a reasonable positive
correlation between the frequency and/or number of spectra matched to a protein in AtProteome, PPDB or
ProMex, or AT_Chloro with abundance; thus comparing spectral counts allows ranking proteins by their
abundance; AtProteome provides spectral count information about distribution across different organs.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.
21
Query 6: Is a plastid protein in a complex and what is the composition of the complex; what is the
experimental evidence? It is currently not possible to get a direct answer to this question from the
databases listed in Table 1.
www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.