update on plastid proteomics in higher plants; current state and future goals klaas j. van wijka

21
Update on Plastid Proteomics in higher plants; current state and future goals Klaas J. van Wijk a # and Sacha Baginsky b # a Department of Plant Biology, Cornell University, Ithaca, NY 14853, USA. b Martin-Luther-Universität Halle-Wittenberg, Institut für Biochemie, 06120 Halle, Germany #for correspondence: Klaas J. van Wijk, Dept. of Plant Biology, Emerson Hall 332, Cornell University, Ithaca, NY 14853. Tel: 1-607-255-3664; Fax: 1-607-255-3664; [email protected]; Sacha Baginsky, Martin-Luther-University Halle-Wittenberg, Institut für Biochemie, Abteilung Pflanzenbiochemie, Weinbergweg 22 (Biozentrum), 06120 Halle (Saale), Tel: +49 345 55 25470; Fax: +49 345 55 27012; [email protected]. Plant Physiology Preview. Published on February 24, 2011, as DOI:10.1104/pp.111.172932 Copyright 2011 by the American Society of Plant Biologists www.plantphysiol.org on April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1

Update on Plastid Proteomics in higher plants; current state and future goals

Klaas J. van Wijka# and Sacha Baginskyb#

a Department of Plant Biology, Cornell University, Ithaca, NY 14853, USA.

bMartin-Luther-Universität Halle-Wittenberg, Institut für Biochemie, 06120 Halle, Germany

#for correspondence: Klaas J. van Wijk, Dept. of Plant Biology, Emerson Hall 332, Cornell University,

Ithaca, NY 14853. Tel: 1-607-255-3664; Fax: 1-607-255-3664; [email protected]; Sacha Baginsky,

Martin-Luther-University Halle-Wittenberg, Institut für Biochemie, Abteilung Pflanzenbiochemie,

Weinbergweg 22 (Biozentrum), 06120 Halle (Saale), Tel: +49 345 55 25470; Fax: +49 345 55 27012;

[email protected].

Plant Physiology Preview. Published on February 24, 2011, as DOI:10.1104/pp.111.172932

Copyright 2011 by the American Society of Plant Biologists

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

2

1. Significance of Plastids in Plant Biology

Plastids are plant cell organelles with many essential functions in plant metabolism. Among these are

photosynthesis, amino acid and fatty acid biosynthesis, as well as the synthesis of several secondary

metabolites. All plastids originate from undifferentiated proplastids which are restricted to meristematic

tissues and undifferentiated cells. Depending on the tissue, proplastids can develop into different plastid

types, e.g. amyloplasts in storage tissue, chloroplasts in photosynthetic tissues and chromoplasts in fruits

and flowers. Other specialized plastid types include gerontoplasts, the plastids of senescent leaves that are

important for resource allocation, oleoplasts which are oil storage plastids in e.g. olive, and etioplasts, the

final stage of proplastid development in photosynthetic tissues in the dark (Wise, 2006). Finally, plastid

types can possibly specialize to different degrees depending on cell-type, developmental state and

(a)biotic conditions. An extreme case are the highly specialized C4 chloroplasts in bundle-sheath and

mesophyll cells in the maize leaf with strong differences in proteome composition (Friso et al., 2010).

With the ability to develop and differentiate, plastids add versatile biosynthetic capacity to the plant cell

and are responsible for unique biosynthetic pathways that make plants unrivaled biochemical factories

that are essential for life on earth. Thus, significant research efforts are underway that aim at

understanding plastid biology in depth. A decade ago, the first plastid proteomics study was published

and the potential of plastid proteomics was outlined (van Wijk, 2000). Since then, proteomics of plastids

and plant (sub)proteomes has delivered on its promise. Here, we will provide an update on the current

status of plastid proteome research, a decade after the first reports.

2. Advances in plastid proteomics and proteomics technology

Plant and plastid proteomics are now well established scientific disciplines with many laboratories

contributing to their progress. Not surprisingly, a significant fraction of the plastid proteome is

characterized today, and available information includes protein quantities, protein interactions and

posttranslational modifications (PTMs), as will be briefly highlighted in this update. However several of

the challenges for plastid proteomics outlined 10 years ago, still exist today, including the detection of

low abundance proteins (e.g. >10,000 fold lower than Rubisco), and capturing the dynamics of plastid

proteomes.

Much of the progress has been driven by technology development and improved genomics

resources. The main difference between proteomic technologies today and 10 years ago is the much

improved sensitivity (routinely at 1-50 fmol), the accelerated duty cycle (now MS/MS scans within a few

hundred msec), the improved mass accuracy (down to a few ppm for peptides) and the increased

resolution (up to 100,000) of the latest generation mass spectrometers. Furthermore coupling of nano-LC

with MS/MS is now routine and split-free nano-LC systems now deliver low flow-rates for nanospray

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

3

ionization with excellent reproducibility. Also important are improved software tools for the reliable

identification of peptides based on MS/MS spectra along with statistically sound estimates of false

discovery rates in large datasets. With the maturation of proteomics workflows, quantitative information

for plastid proteins became available. These new technologies, in combination with the availability of

multiple sequenced plant genomes now allow for answering more comprehensive and sophisticated

questions as compared to a decade ago (Baginsky, 2009; Gstaiger and Aebersold, 2009; Schulze and

Usadel, 2010; Walther and Mann, 2010).

In this update we will review progress on plastid proteomics and lay out a series of challenges

that can be addressed within the next few years. Table 1 provides an overview with web-based plant

proteomics resources that are relevant for plastid proteomics. We will center this update around the

concept of a plastid protein atlas. Box I provides examples of the wide range of queries that a high quality

plastid protein atlas should ultimately be able to answer. For more exhaustive overviews on proteomics of

plastids, we refer to two recent review articles and references therein (Baginsky, 2009; Agrawal et al.,

2010); space restrictions do not allow us to cite recent literature more extensively.

3. Development of a plastid protein atlas

The concept of a protein atlas was established several years ago in particular for the human proteome

(http://www.proteinatlas.org/). This concept involves the generation of protein inventories for each organ

and subcellular localization tagged with additional protein information, such as splice variants, PTMs,

protein-protein interactions, etc. Similar efforts are underway to collect all available information for

plants to generate a plant proteome atlas that includes proteome information for the different plant organs

and their organelles. Here we concentrate on the plastid proteome atlas. In order to disseminate

biologically useful information, such a plastid atlas should include: i) protein accessions for each plastid

type, including cellular specialization and subplastid localization, ii) information on peptide coverage of

each identified protein and possibly different gene models, iii) steady state protein abundance under a set

of well-defined (a)biotic conditions as well as developmental states, iv) protein-protein interactions,

protein-nucleotide assemblies and oligomeric state, v) reversible and irreversible PTMs, and vi)

bioinformatics information such as subcellular localization predictions and network information.

Plastids are among the best characterized cell organelles at the proteome level and a quality

chloroplast protein atlas is now emerging. However, the plastid protein atlas is far from complete and

strategies to improve proteome coverage and in-depth characterization must be developed and

implemented. In the following paragraphs we will review the status of each of the six components of the

plastid protein atlas and outline strategies for improvement.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

4

3.1 Improved protein inventories for each plastid type, including cellular specialization and

subplastid localization

The predicted size of the combined proteome of all plastid types ranges from 2000-3500 proteins in

Arabidopsis thaliana, representing about 7-12% of all predicted protein-encoding genes. However, only

about 1200 proteins are currently recognized as being plastid-localized (see the Plant Proteome Database

PPDB at http://ppdb.tc.cornell.edu). Comparing this experimental plastid proteome data set with the

predicted plastid proteome showed that in particular plastid proteins involved in signaling and plastid

gene expression and RNA metabolism are strongly underrepresented. There are several reasons why a

significant percentage of plastid proteins has not yet been recognized: 1) low abundance in chloroplasts,

i.e. their detection is obscured by highly abundant photosynthetic proteins, 2) specific expression in a

certain plastid type other than chloroplast, 3) only expressed under very specific conditions

(developmental state, abiotic condition or biotic challenge), or 4) too few ionisable tryptic peptides (e.g.

transmembrane proteins with very short loops and tails or very small or basic proteins). Plastid proteome

coverage can be improved by using better mass spectrometry instrumentation with higher sensitivity,

accuracy and faster duty cycle, use of alternative enzymes for protein digestion, more specific (e.g.

affinity-based) fractionation of plastid proteomes or increased efforts to analyze a more diverse set of

plastid types, including heterotrophic plastids. However, as analytical sensitivity increases with these

additional efforts the challenge to distinguish between true positive and false positive plastid proteins

increases as well.

Based on the last decade of plastid proteome research, it is clear that objective filtering strategies

for false positive identification and/or assignment to plastids are essential. The most practical solution

involves repeated analysis of independent plastid preparations and the use of quantitative protein

information for improved filtering of the identified proteins, based on two steps: i) repeat observations in

independent plastid preparations – proteins that are observed at high frequency across these preparations

are more likely to be bona fide plastid proteins, ii) combined proteome information from unfractionated

tissue and different purified organelles to recognize false positives that more highly accumulate in other

subcellular locations – this requires quantitative information about relative protein abundance. Such

relative quantification for these different samples types should ideally be done with the same

experimental workflow and a good example is available from the LOPIT technique that uses relative

protein quantification along density gradients to assign proteins to organelles by association (Dunkley et

al., 2006). The ‘frequency’ filter (i.) is based on the assumption that non-plastid contaminants or false

positive identifications are random events; therefore this first filter does not remove systematic false

positives, such has high abundant cytosolic proteins which can contaminate isolated chloroplasts. A small

percentage of plastid proteins are also located elsewhere in the cell and ~50 dual targeted proteins have

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

5

been discovered for Arabidopsis so far (Carrie et al., 2009). Most of these have shared locations with

mitochondria and are involved in plastid or mitochondrial gene expression (e.g. t-RNA synthetases),

however shared localizations with the nucleus, peroxisome or the cytosol have also been described.

Detection of such dual locations requires independent information, typically from image analysis using

fluorescent fusion proteins and ideally also from phenotypical analysis of mutants.

Collection of all available protein localization data from individual functional studies, as well as

proteomics studies, is an important tool in conclusive assignment of proteins to the plastid. For instance it

helps to recognize abundant proteins often identified in dozens of proteomics papers as potential

contaminants. The SUBA database (http://suba.plantenergy.uwa.edu.au/) collects information for

Arabidopsis that is available about the localization of a certain protein, e.g. MS/MS data, GFP-

localization and prediction tools, and allows assembling lists of organellar proteins with self-defined

reliability criteria. The PPDB accumulates similar information for Arabidopsis, as well as maize , and

combines it with in-house MS/MS based quantitative information on total leaf extracts and isolated

plastid (fractions) (stored in PPDB) to manually evaluate this information and make a manual assignment

for subcellular localization. This manual curation step using a conservative threshold (i.e. no call is made

unless there is deemed sufficient evidence) has proven to result in high confidence localization calls as

judged by comparisons with subsequent independent experimental localization studies by GFP fusions

and image analysis.

Another way to help completing the plastid proteome inventory is to analyze plastid types

specialized for specific tasks in their resident tissue (organ or cell type) because they differ considerably

in their protein composition. However, this is challenging for Arabidopsis since its seeds and flowers are

small and it does not develop storage organs. Thus, organelle isolation is often impracticable and

proteome analyses are better performed at the level of the entire organ as illustrated by the analysis of

plastids in seeds (Chen et al., 2009). Several groups tried to circumvent this problem by using different

plant species, e.g. tobacco, bell pepper, spinach, pea, wheat, potato, tomato or Brassica rapa for the

analysis of amyloplasts, chromoplast, proplastids and leucoplasts (Agrawal et al., 2010). However, so far

this has not significantly increased the number of identified plastid proteins, in part due to the lack of

complete genome sequence information. Exceptions are rice and maize because good quality genome

annotation is available for these two organisms and the coverage of the plastid proteome of maize is now

quite comparable to Arabidopsis in part because cell-type specific chloroplasts, specialized for specific

functions, were included (Friso et al., 2010). Importantly, this allowed identification of C4-specific

metabolic chloroplast envelope transporters and also helped identify many new subunits of the elusive

thylakoid NADPH dehydrogenase complex involved in cyclic electron flow (Brautigam et al., 2008;

Majeran et al., 2008).

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

6

The spatial distribution of proteins within chloroplasts has been the target of several proteome

analyses, originally starting with the thylakoid lumen and peripheral soluble thylakoid proteins (Peltier et

al., 2000), followed by systematic analyses of the thylakoid and envelope membrane proteomes, the

soluble stroma proteome, specialized thylakoid-associated lipoprotein particles, assigned plastoglobules

and proteins associated with the plastid chromosome (Baginsky et al., 2009, Agrawal et al., 2010). A

recent study separated the Arabidopsis chloroplast proteome into soluble proteins and thylakoid and

envelope membrane proteins (Ferro et al., 2010). Protein localization to each subcompartment was based

on the abundance distribution of identified proteins in different purified fractions. Information about the

protein composition of the chloroplast sub-compartments is available in PPDB and AT_CHLORO

(http://www.grenoble.prabi.fr/at_chloro/). Because of space constraints in this update, we refer the reader

to the most recent and comprehensive review with extensive literature citations (Agrawal et al., 2010)

instead of discussing the original literature in this report.

3.2. Discovery and significance of gene models

Many genes have more than one annotated gene model, in some cases the different models only affect

untranslated 5’ and 3’ ends, whereas in others this affects the actual translated region. This is achieved by

different transcription start sites or by alternative splicing (AS). AS has received considerable attention at

the transcript level, in particular since new generation sequencing techniques now allow for large scale

detection of alternative splicing. At least 20% of plant genes have one or more alternative transcript

isoform. The majority of these AS events have not been functionally characterized, but evidence suggests

that AS participates in important plant functions, including stress response, and may impact domestication

and trait selection. Alternative transcription start sites or AS can result in proteins with different N-or C-

termini or internal protein regions, potentially affecting subcellular localization and functions. Indeed, one

of the mechanisms for dual targeting is that two different proteins that differ in their N-terminus are

generated from a single gene (Peeters and Small, 2001). Matching mass spectrometry data to these

different gene models can help to identify the most relevant predicted protein forms. In the PPDB (for

Arabidopsis and maize) and AtProteome (http://www.pep2pro.ethz.ch), peptide identification data are

projected on each gene model, allowing evaluation of the most relevant models. However, a systematic

analysis of the consequences of AS at the plant proteome level has not been carried out; this is not

surprising given the challenges associated with obtaining nearly complete sequence coverage (i.e. the %

of primary amino acid sequence for which peptides are detected) that is required to distinguish different

gene models.

In case of mass spectrometry-based quantitative proteomics, decisions have to be made how to

handle protein models. For instance, one model may have more matched peptides than another model.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

7

One solution is to select only the information for the highest scoring model or alternatively collect and

sum all matched peptides for all protein models of a gene. In practice this may not effect most

quantifications, but it is important to systematically implement a chosen procedure. The van Wijk lab

consistently selected the higher scoring protein model (calculated across all samples for the specific

analysis) and if there was no difference in protein score between models, the model with the lowest digit

was selected (see e.g. (Friso et al., 2010)). Other labs sum up all spectral counts for a gene and remove the

model information (Baerenfaller et al., 2008). Either method has its merits and it is important that the

applied procedure is transparent.

3.3. Protein abundance within the plastid

The range of protein accumulation levels in plant organs and within the plastid likely spans up to ~10

orders of magnitude. Using 1-DE gel separation, followed by in-gel digestion and the latest generation of

tandem mass spectrometers for un-targeted (‘shotgun’) analysis with data-dependent acquisition (DDA),

proteins are typically identified within an abundance range of 5 to maximally 6 orders of magnitude.

Mapping plastid protein abundance is important to understand the composition of protein complexes,

functionalities of plastid membranes and plastid particles such as plastoglobules or nucleoids, as well as

understanding plastid metabolism and consideration of metabolic flux. In addition, as discussed in the

previous session, relative protein abundance measurements are also an important tool to evaluate if

proteins are indeed plastid localized. When discussing protein quantification, we must distinguish

between i) measuring protein mass or protein concentration within a sample, and ii) comparing relative

protein concentrations (or mass) of the same protein between different samples. The latter case is often

referred to as measuring differential protein expression or ‘functional proteomics’, e.g. when studying the

effect of (a)biotic stress, developmental processes or mutants. Most (plant) protein quantification studies

relate to differential expression (functional proteomics). In the current section, we will discuss the first

case, whereas the second case is briefly discussed in section 4 (Employing the plastid proteome atlas for

functional analysis).

The two strategies that have so far been employed to map protein abundance within the plastid

are: i) image analysis of stained two-dimensional gels and ii) mass spectrometry-based quantification

using spectral counting. Quantification using 2D gel electrophoresis with IEF as the first dimension was

used in most gel based studies, e.g. for the thylakoid lumen (Schubert et al., 2002) or soluble proteins in

rice etioplasts (Kleffmann et al., 2007); however in most other studies this was applied to ‘functional

proteomics’. 2DE gels with native gel electrophoresis as the first dimension was used to determine a

quantitative map of soluble chloroplast proteins and their oligomeric states in the stroma of Arabidopsis

thaliana (Peltier et al., 2006). In a subsequent study, Arabidopsis stromal proteins were quantified using

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

8

mass spectrometry based spectral counting (Zybailov et al., 2008). Both complementary procedures were

also carried out for chloroplast membranes and stromal fractions of isolated bundle sheath and mesophyll

cells of maize leaves (Majeran et al., 2008). The advantage of IEF based 2D gels lies mostly in the higher

resolution of IEF compared to native gels; however IEF gels systematically lead to (often strong)

underestimation of higher molecular mass proteins and hydrophobic proteins, whereas proteins with

extreme pI (<4 or >10) are harder to resolve. For the mapping of absolute protein abundances including

membrane proteins, colorless native or blue native gels are thus the better alternatives.

Directly comparing image and MS-based methodologies showed that image-based quantification

is very limited in the number of proteins that can be accurately quantified because protein spots need to be

fully separated from other spots to avoid quantifying protein mixtures. Furthermore, the quantification is

significantly affected by the amino acid composition, because current dyes bind in particular to basic

residues, leading to over- or underestimation of proteins, depending on the amino acid composition. MS-

based quantification allows for quantification of a much larger number of proteins, typically resulting in a

higher dynamic range. However highly abundant proteins (e.g. the ~10-20 most abundant proteins in a

sample) are often underestimated because of the necessary use of data-dependent acquisition (DDA) (see

Zybailov et al 2008 - for numbers), whereas proteins quantified with low numbers of MS/MS spectra can

show quite large sample to sample variation. In general, proteins are most accurately quantified if

multiple unique peptides are detected, each in high numbers. Conversion of protein mass quantification

(either by the image analysis or MS-based quantification) to protein concentration requires normalization

by either the number of predicted tryptic peptides within the relevant mass window (in case of MS-based

quantification) or by protein length or mass (for both image- and MS-based quantification). Despite the

advantages described above, 2D-PAGE has still a place in quantitative proteomics in particular for

analysis of protein complexes and because it provides an immediate visible overview of the proteome.

The ‘gold-standard’ for protein abundance measurements is to spike the sample with isotope

labeled proteins or proteotypic peptides, assigned as ‘isotope dilution’ (Brun et al., 2009). These peptides

can be generated by in vitro synthesis or by expression as a concatamer of proteotypic peptides after

construction of a synthetic gene, QconCAT. Both methods require significant investments and typically

are applied to smaller numbers of proteins – these techniques are therefore currently not practical for

quantification of hundreds of proteins and have so far been applied only to targeted analysis of selected

plastid pathways (Wienkoop et al., 2010). However, efforts are underway to establish QconCAT to

determine the stoichiometry of the Clp protease complex and for the quantification of specific plastid

(plant) metabolic pathways or plastid processes.

3.4. Protein-protein interactions, protein-nucleotide assemblies and oligomeric state

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

9

To carry out their metabolic, structural or signaling functions, many plastid proteins form transient or

stable interactions with other proteins. Few undirected systematic protein interaction studies have been

carried out for soluble stromal complexes, either by native gel electrophoresis (below 800 kDa) or by

chromatography (>800 kDa) (Olinares et al., 2010); these two complementary studies provide an

overview of the oligomeric state of >1000 proteins. In particular protein assemblies larger than 800 kDa

are dominated by functions in plastid gene expression including nucleoids, mRNA metabolism and

ribosomes. The interaction of plastid proteins with DNA or RNA constitutes a regulatory network of gene

expression. The largest structures of several megaDaltons are nucleoids also known as transcriptionally

active chromosome (TAC), which contains several copies of plastid DNA and dozens of DNA and RNA

binding proteins, including proteins likely regulating nucleoid activities through reduction/oxidation or

phosphorylation (Pfalz et al., 2006). Envelope-membrane protein complexes are dominated by the

translocon complexes at the inner and outer envelope membrane (TIC and TOC). These import

complexes are functionally relatively well characterized by a variety of techniques, including blue-native

gels (Kikuchi et al., 2009) and references therein. The abundant photosynthetic protein complexes in the

thylakoid membrane have been a target for biochemical research for several decades and are now well

characterized through a number of methodologies. Most proteins in these complexes have been identified

and characterized by mass spectrometry and for some of them PTMs have been determined by intact

protein mass spectrometry (Whitelegge, 2004).

More detailed protein-protein interaction studies, using either co-immunoprecipitation or affinity

purification using transgenic plants that express tagged transgenes, are needed to better characterize the

plastid proteome interactome. This will help to better understand in particular regulation of metabolism

and plastid gene expression and to build reliable protein interaction networks to complement the plastid

proteome atlas.

3.5. Reversible and irreversible PTMs

Most proteins undergo reversible and sometimes irreversible modifications. Large scale analysis of

PTMs, using a high resolution, high accuracy LTQ-Orbitrap mass spectrometer, was carried out for

chloroplast membranes and stroma, as well as total leaf extracts and the frequencies of many PTMs were

calculated (Zybailov et al., 2009). This analysis provides a framework for search parameters and the use

of retention times for improved assignment of PTMs in large-scale proteomics, and helps distinguishing

artificial modifications from those with a biological relevance. For nuclear-encoded plastid proteins, the

most typical irreversible in vivo modification is proteolytic cleavage of the N-terminal transit peptide, the

cTP. In case of most plastid-encoded proteins, typically the N-terminal methionine is removed by

methione amino peptidases, which have been identified in plastids. Another frequent N-terminal

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

10

modification that occurs after removal of N-terminal targeting information is N-terminal acetylation.

Because N-terminal acetylation requires in situ enzyme activity, it provides a reliable determination of the

N-terminus and thus valuable information about the processing site for transit peptides of imported

chloroplast proteins. Thus N-terminal acetylation allows mapping the in vivo N-termini of plastid and

cytosolic proteins. Kleffmann and colleagues established for a small set of proteins from rice etioplasts

the in vivo N-terminus and found that there is a good agreement between the detected N-terminal peptide

and the predicted processing peptidase cleavage site (Kleffmann et al., 2007). Similarly, Zybailov and

colleagues identified a larger set of N-terminal acetylated proteins in Arabidopsis chloroplasts and

provided additional context information for the processing protease cleavage site, also indicating that the

predictive cleavage site is one residue off from the actual cleavage site (Zybailov et al., 2008).

Improvements for cleavage site prediction should be possible based on the now available larger training

set.

PTMs often determine enzymatic activities and rapidly adjust enzyme activity to the requirements

of the cellular metabolism; protein abundance does likely correspond to maximal (theoretical) activity but

is not always a good indicator for in vivo enzyme activity and its net contribution to cell metabolism. It is

well established that reversible phosphorylation and reduction/oxidation, e.g. through the action of

different types of plastid thioredoxins (TRX), are key regulators of plastid metabolism, as well as plastid

gene expression (Dietz and Pfannschmidt, 2010). Several proteomics studies identified thioredoxin targets

by affinity chromatography, whereas other redox proteomics approaches used diagonal electrophoresis

under reducing and oxidizing conditions to identify proteins under redox control in vivo (Dietz and

Pfannschmidt, 2010). These analyses demonstrated that many chloroplast functions are regulated by

TRX-mediated disulphide/dithiol exchange, or by currently unknown redox modulators. Among these

functions are isoprenoid and tetrapyrrole biosynthesis, starch biosynthesis and degradation, gene

expression, protein folding and degradation, vitamin biosynthesis. Redox targets in the thylakoid lumen

were identified and inhibition of the activity of the xanthophyll cycle enzyme violaxanthin de-epoxidase

by reduction, i.e. dithiol generation was established (Dietz and Pfannschmidt, 2010).

Over the last few years, two thylakoid associated kinases (STN7, STN8), as well as a thylakoid

associated phosphatase (TAP38/PPH1), have been identified and their functions were investigated by

functional analysis of Arabidopsis mutants (Lemeille and Rochaix, 2010). The reversible phosphorylation

system at the thylakoid membrane regulates photosynthetic state transitions to optimize light absorption,

as well as long term light adaptation. 175 phosphorylated chloroplast proteins were identified, with 80%

serine and 20% threonine phosphorylation, but no tyrosine phosphorylation. One of the thylakoid kinases,

STN7, was found to be an abundant phosphoprotein in vivo suggesting the existence of kinase cascades in

the chloroplast. Information about the exact site of phosphorylation was used to extract kinase motifs

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

11

which are useful footprints for kinase activity in vivo (Reiland et al., 2009). Cumulative evidence for plant

proteome phosphorylations are collected in various databases, such as the PhosPhAt database for

Arabidopsis (http://phosphat.mpimp-golm.mpg.de/)

3.6. Subcellular localization predictions and network information

The distribution of cellular functions to distinct cell organelles is an important organization principle that

needs to be understood to model metabolic and protein interaction networks, to make predictions at the

systems scale. Thus, analyses of the protein composition of cell organelles were reported for virtually all

plant cell organelles or membranes (Baginsky, 2009; Agrawal et al., 2010). At present, plant modeling

and systems analysis approaches with subcellular organelles suffer from incomplete proteome

identification and annotation. More complete organelle inventories will strengthen modeling efforts and

higher network consistencies should be obtained. In order to make a contribution to model quality,

however, protein localization data should have low false positive rates, e.g. below 1%. Therefore,

conservative assignment of protein subcellular localization in papers and public databases is better than

over-assignment of proteins, in particular since it is not really possible to associate a p-value for

subcellular localization assignment based on experimental data. Thus, the community’s goal should be a

plastid proteome atlas with high sensitivity and a very low false positive rate.

In addition to the experimental organelle proteome analysis, subcellular localization prediction is

a possible source of information for ‘missing’ plastid proteins, even if suboptimal. The generation of

software routines to predict subcellular protein localization for plants, other eukaryotes, as well as

prokaryotes, has been in progress for well over a decade, in particular inspired by the increasing amount

of protein inventories for different subcellular localizations. These inventories provide essential training

and test sets. Whereas the prediction of N-terminal signal peptides (SPs) for SRP-dependent targeting to

the endoplasmatic reticulum is rather accurate and sensitive, prediction of plastid localization is much less

satisfactory and still attracts considerable attention. A consensus prediction combining several predictors

using a naïve Bayes method was suggested to improve both sensitivity and specificity for plastid and

mitochondrial proteins (Schwacke et al., 2007). In the last 2-3 years several new localization predictors

(e.g. AtSubP, Subchlo, RSLpred, MultiP, Plant-mPLoc) were published for plants mostly focusing on

Arabidopsis. While each predictor may have advantages over the other one, it is not clear that their

prediction has a better true positive discovery rate for plastid proteins (i.e. a higher sensitivity) at a lower

false positive discovery rate (i.e. a better specificity) than the most popular predictor TargetP

(http://www.cbs.dtu.dk/services/TargetP/).

TargetP is still the most commonly used predictor for plastid, as well as plant mitochondrial

localization that not only predicts localization, but also the cTP and mTP cleavage sites. There is still

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

12

some controversy on the true positive prediction rate of TargetP that was found to differ between

experimental datasets. While plastid proteome studies from the van Wijk lab and others reported true

positive prediction rates in the range of 85% consistent with the benchmark tests obtained during TargetP

training, other groups found much lower prediction rates on their plastid protein set (Armbruster et al.,

2011). Higher TargetP true positive rates (sensitivity) are usually observed when proteins were eliminated

not repeatedly detected in plastid preparations, while also applying conservative thresholds for protein

identification (see also discussion under 3.1). Importantly, sets of detected low abundant Arabidopsis

proteins (several orders of magnitude lower than e.g. RBCL), e.g. those involved in RNA metabolism,

have similar true positive prediction rates as high abundant proteins (Olinares et al., 2010). However,

proteins located in the outer plastid envelop membrane or those reversibly associated with the outer

envelope should be excluded from such prediction analysis because they do not possess an cleavable N-

terminal plastid targeting sequence. The main shortcoming of TargetP is the high false positive rate (low

accuracy), likely around 35%, leading to an overprediction for plastid proteins.

The current sensitivity and accuracy of TargetP is clearly not perfect and the much larger sets of

established subcellular proteomes for Arabidopsis (and to a lesser degree also maize and rice) should be

useful to improve the performance of plastid localization predictors. In addition, it is quite likely that a

subset of nuclear-encoded plastid proteins have atypical targeting information. For instance it has been

shown for a few plastid proteins that they are targeted to the plastid via the ER, the N-terminus of these

precursor proteins contains a secretory signal peptide (SP), followed by a cTP (Villarejo et al., 2005).

However, scanning for SPs of ~1000 established plastid proteins in Arabidopsis suggested that probably

very few proteins take this route (Zybailov et al., 2008). However, it is possible that there is yet another

pathway (or recognition system) for protein translocation across the envelope that account for the

imperfect true positive rate; the recent finding of an envelope-localized SEC system may be relevant here

(Skalitzky et al., 2011). Finally, it may be optimal to develop and test localization software for specific

species, plant families or even clades. For instance, monotyledons such as rice, sorghum and maize may

have systematically different protein targeting information as compared to dicotyledons such as

Arabidopsis, tobacco, pea and spinach. Indeed, systematic analyses of established rice plastid proteins as

well as rice orthologs for Arabidopsis chloroplast proteins showed that alanine instead of serine or

threonine is overrepresented in the cTP (Kleffmann et al., 2007; Zybailov et al., 2008).

With detailed information about the enzymatic inventory of organelles, their specific contribution

to metabolism and signaling is also accessible to large-scale modeling approaches. Genome-scale

metabolic networks for the C3 and C4 plants, respectively Arabidopsis thaliana and maize, as well as the

green algae Chlamydomonas reinhardtii, were constructed that take into account compartmentalization

and allow assessing the specific contribution of cell organelles to metabolism (Dal'Molin et al., 2010).

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

13

Large-scale protein/protein interaction networks also benefit significantly from knowledge about the co-

localization of proteins in the same organelle. This information decreases false discovery rates in large-

scale interaction datasets for Arabidopsis, thereby increasing the reliability of predicted interaction

networks. Progress has been made for the assembly of plant organellar phosphorylation networks, and for

chloroplasts in particular the (de)phosphorylation driven movement of light harvesting complexes in the

thylakoid membrane (assigned state transitions) (Lemeille and Rochaix, 2010). Studies in non-plant

species have shown that using phosphoproteomics information, it is possible to infer in vivo kinase

activities from phosphorylation motifs to provide information about kinase/substrate relationships, and

together with localization information, construct in vivo phosphorylation networks. Thus, protein

inventories of cell organelles are important constraints in constructing signal transduction networks. Last

but not least, publicly available and reliable protein subcellular localization will be helpful and cost-

effective in the functional analysis of genes and proteins, as the need to determine the localization for

each protein is fulfilled.

4. Employing the plastid proteome atlas for functional analysis and systems biology

Even if the plastid protein atlas is not complete, it does provide a rich source of information and a great

tool for detailed functional studies. Table 1 lists the available proteomics resources with relevance to

plastid biology and the Box provides a number of example questions that can be addressed with the

available tools. Now that subcellular localization of many proteins is known, it is possible to analyze the

qualitative and quantitative effects of mutations of specific organelles without actually purifying these

organelles. For instance, quantitative comparative proteome analysis of chloroplasts from wild-type and

different chloroplast Clp protease mutants was done using mass spectrometry-based quantification of total

Arabidopsis leaf extracts without actually isolating chloroplasts (Kim et al., 2009). The advantages of

characterizing quantitative effects on the chloroplast proteome through analysis of total leaf extracts,

rather than through analysis of isolated chloroplasts, are that: (i) mutants with strong growth defects can

be analyzed; isolation of chloroplast from such mutants can be very hard or even practically impossible;

(ii) more accurate results are obtained for chloroplast mutants with heterogeneity in their leaf phenotype

(often with strongest phenotypes in the youngest leaves); isolation of chloroplasts from such leaves could

result in selection of a subset of chloroplast phenotypes, not representing the overall chloroplast

population. Furthermore, such subcellular proteome information for maize, allowed to help resolve the

kinetics of the organelle biogenesis, formation of cellular structures and metabolism during maize leaf

development and C4 cellular differentiation (Majeran et al., 2010). The current generation of mass

spectrometers have sufficient sensitivity and throughput to detect and quantify a high number of

chloroplast proteins even in complex mixtures. Furthermore such a ‘total leaf’ approach can be helpful for

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

14

analyses of dynamic PTM that prevent lengthy organelle isolation procedures (Reiland et al., 2009), in

particular if no inhibitors can be applied to prevent change in such PTMs. With a plastid protein atlas for

Arabidopsis and maize at hand, it can be expected that large scale comparisons of chloroplast proteomes,

their PTMs and interaction networks under different conditions and in different genetics backgrounds or

developmental states will provide novel insight in plastid biology.

5. Deposition of proteomics and mass spectrometry information in pubic repositories

Most published plastid proteomics studies on Arabidopsis provide tables containing lists of the identified

proteins using standardized, non-redundant accession numbers provided through TAIR. For other plant

species this is more varied either because there is no sequenced genome or significant sized EST available

or because databases are searched such as NCBI that contain redundant sets of accessions (e.g. older and

newer version of genes); this can complicate incorporation of such data sets by other laboratories.

However, submission of the underlying mass spectra with associated metadata to public repositories such

as the Proteomics Identifications Database, PRIDE (http://www.ebi.ac.uk/pride), will allow other

laboratories to make use of these studies. And even for Arabidopsis and other new model (crop) species

such as maize and rice, it is important that the mass spectral data are deposited, for instance to help

improve search engines, improve genome annotation or allow for comparative analysis by other

laboratories. Indeed, several journals (e.g. Molecular and Cellular Proteomics, Nature Biotechnology)

now require submission of mass spectral data to such public repositories, similar as is customary for

microarray data or RNAseq data sets. Further more detailed descriptions of experimental conditions and

acquisition parameters are outlined in the MIAPE (Minimum Information About a Proteomics

Experiment) descriptions and enforced by several journals. We strongly support following these standards

and deposition of mass spectral data (e.g. converted MGF files) into PRIDE or other repositories.

6. CONCLUSIONS

Proteomics of chloroplasts and other plastid types has provided extensive protein inventories, as well as

information about PTMs, protein abundances and protein interactions. Proteomics and mass spectrometry

technologies feeding into plastid proteome information now allows system level analysis of chloroplast

biology, including chloroplast development, signaling and interaction networks. For reasons detailed

above, we consider a high quality plastid proteome atlas a milestone in the quest for biologically

meaningful systems biology approaches. Together with parallel efforts for other organelles (e.g.

mitochondria and peroxisomes) this will help to drive a better understanding of plant growth and

development and help realize the potential of plant systems biology.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

15

ACKNOWLEDGEMENTS

We would like to thank the members of our labs for discussions and feedback on this manuscript

Furthermore, we sincerely apologize to all colleagues whose work could not be cited because of space

constraints. Plant and plastid proteome-related research in the van Wijk lab is currently supported by

National Science Foundation grants MCB-1021963, IOS-0701736 and IOS-0922560. S. Baginsky´s lab is

currently supported by SNF grant 31003A_127202 and the Martin-Luther-University Halle-Wittenberg.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

16

Database species main purpose/objectivein-house experimental

plant materialtype of in-house experimental

information

functional annotation (name

and function)

subcellular localization

predictors

AtProteomeb AtExperimental information about identified proteotypic peptides, protein abundance in

different organs and spectral/peptide evidence for gene models different organs of At

in-house MS based identification. Peptides, ion scores, ppms,

Mowse scores, Meta informationTAIR

none, but detailed organ information

none

PPDBc At, Zm, Os

Curated information for all proteins and protein models in At and Zm, incl. protein information and functional annotation. Experimental information about leaf and

subcellular fractions with MS-based identification details incl. spectral counts and PTMs. M

Zm and At leaves and chloroplast fractions (eg

stroma, thylakoids, lumen, plastoglobules, etc). For Zm also BSC

and MC specific chloroplasts to study C4

effects. For At, also different mutant

background in Col-0.

in-house MS based identification. Peptides, ion scores, ppms,

Mowse scores, Meta information

Manual annotation of name and

functions (MapMan system); news functional bins

created as needed.

Manual assignment to subcellular localization where possible. Based

on in-house experimental

proteomics daat, GFP/YFP studies,

external proteomics studies (only

qualitative), functional papers

TargetP, TM-HMM,

LumenP

At-Chlorod AtIn-house analysis of At chloroplast proteome and its substructures (envelope, stroma,

thylakoid) with detailed proteomic information (peptides, MW, retention times, identification statistics).

chloroplast fractions (envelope, stroma and

thylakoid)

in-house MS based identification. Peptides, ion scores, ppms,

Mowse scores, Meta information

manual curation of name, functional

annotation by MapMan

chloroplast stroma, thylakoid, envelope

none

Plprote At, Os, Nt, Ca

Proteome analyses of rice etioplasts, At chloroplasts, pepper chromoplasts and the undifferentiated proplastid-like organelles of tobacco BY2 cells, plastid type-specific

functions.

different plastid types from various species

peptide identifications, homologues identified in other

plastid types, interactive 2D PAGE from differently illuminated

etioplasts

noneidentifications from

isolated plastidsnone

SUBAf AtFacilitate subcellular protein localization analysis based on different public prediction

tools, proteomics papers and GFP/YFP localization studies. Allow combinatorial queries on the contained data.

none; literature only NAlinks to TAIR, AmiGO and

UniPROT

users can employ various queries to generate answers

multiple localization predictors

PhosPhatg AtPhosphoproteome information from published and unpublished sources, identified

peptides or ions with annotated phosphorylation site (where available). Provides a P-site prediction tool.

At phosphoproteome at different conditions

Specific information about peptide properties, annotated biological function as well as the analytical

context; provides the phosphopeptide spectrum

TAIR none

Plant specific P-predictor

(pSer, pThr, pTyr).

RIPP-DBh At, OsPlant Phosphoproteome DB with information on phosphopeptides by LC-MS/MS-

based shotgun phosphoproteomicsrice and Ath cell cultures described in associated papers

hyperlinks to other databases

none none

ProMexi At, Cr, Mt, St

Mass spectral reference database of tryptic peptides from plant proteomes Different sourcesdisplay of MS/MS spectra with

annotationnone none none

#a http://gator.masc-proteomics.org/c http://ppdb.tc.cornell.edu d http://www.grenoble.prabi.fr/at_chloro/b http://fgcz-atproteome.unizh.che http://www.plprot.ethz.chg http://phosphat.mpimp-golm.mpg.de/i http://promex.pph.univie.ac.at/promexh https://database.riken.jp/sw/links/en/ria102i/f http://suba.plantenergy.uwa.edu.au/

abbreviations for species: At - Arabidopsis thaliana ; Os -Oryza sativa ; Zm - Zea mays ; Mt - Medicago truncatula ; St - Solanum tuberosum ; and the green algae Cr - Chlamydomonas reinhardii

Table 1. Plant and plastid proteomics databases that provide information and tools for finding plastid proteins in Arabidopsis and other plant species, as well as functional annotation, post-translational modifications, peptide information and spectral da

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

17

LITERATURE CITED

Agrawal GK, Bourguignon J, Rolland N, Ephritikhine G, Ferro M, Jaquinod M, Alexiou KG,

Chardot T, Chakraborty N, Jolivet P, Doonan JH, Rakwal R (2010) Plant organelle

proteomics: Collaborating for optimal cell function. Mass Spectrom Rev: on-line prepublication

Oct 29. [Epub ahead of print]

Armbruster U, Pesaresi P, Pribil M, Hertle A, Leister D (2011) Update on Chloroplast Research:

New Tools, New Topics, and New Trends. Mol Plant 4: 1-16.

Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S,

Zimmermann P, Grossniklaus U, Gruissem W, Baginsky S (2008) Genome-scale proteomics

reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320: 938-941.

Baginsky S (2009) Plant proteomics: concepts, applications, and novel strategies for data interpretation.

Mass Spectrom Rev 28: 93-120.

Brautigam A, Hofmann-Benning S, Weber AP (2008) Comparative proteomics of chloroplast

envelopes from C3 and C4 plants reveals specific adaptations of the plastid envelope to C4

photosynthesis and candidate proteins required for maintaining C4 metabolite fluxes. Plant

Physiol 148: 568-579.

Brun V, Masselon C, Garin J, Dupuis A (2009) Isotope dilution strategies for absolute quantitative

proteomics. J Proteomics 72: 740-749.

Carrie C, Giraud E, Whelan J (2009) Protein transport in organelles: Dual targeting of proteins to

mitochondria and chloroplasts. Febs J 276: 1187-1195.

Dal'Molin CG, Quek LE, Palfreyman RW, Brumbley SM, Nielsen LK (2010) C4GEM, a genome-

scale metabolic model to study C4 plant metabolism. Plant Physiol 154: 1871-1885.

Dietz KJ, Pfannschmidt T (2010) Novel regulators in photosynthetic redox control of plant metabolism

and gene expression. Plant Physiol prepublication on-line Dec 30

Dunkley TP, Hester S, Shadforth IP, Runions J, Weimar T, Hanton SL, Griffin JL, Bessant C,

Brandizzi F, Hawes C, Watson RB, Dupree P, Lilley KS (2006) Mapping the Arabidopsis

organelle proteome. Proc Natl Acad Sci U S A 103: 6518-6523.

Ferro M, Brugiere S, Salvi D, Seigneurin-Berny D, Court M, Moyet L, Ramus C, Miras S,

Mellal M, Le Gall S, Kieffer-Jaquinod S, Bruley C, Garin J, Joyard J, Masselon C,

Rolland N (2010) AT_CHLORO, a comprehensive chloroplast proteome database with

subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics 9:

1063-1084.

Friso G, Majeran W, Huang M, Sun Q, van Wijk KJ (2010) Reconstruction of metabolic pathways,

protein expression, and homeostasis machineries across maize bundle sheath and mesophyll

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

18

chloroplasts: large-scale quantitative proteomics using the first maize genome assembly. Plant

Physiol 152: 1219-1250.

Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics

and network biology. Nat Rev Genet 10: 617-627.

Kikuchi S, Oishi M, Hirabayashi Y, Lee DW, Hwang I, Nakai M (2009) A 1-megadalton

translocation complex containing Tic20 and Tic21 mediates chloroplast protein import at the

inner envelope membrane. Plant Cell 21: 1781-1797.

Kim J, Rudella A, Ramirez Rodriguez V, Zybailov B, Olinares PD, van Wijk KJ (2009) Subunits

of the Plastid ClpPR Protease Complex Have Differential Contributions to Embryogenesis,

Plastid Biogenesis, and Plant Development in Arabidopsis. Plant Cell 21: 1669-1692.

Kleffmann T, von Zychlinski A, Russenberger D, Hirsch-Hoffmann M, Gehrig P, Gruissem W,

Baginsky S (2007) Proteome dynamics during plastid differentiation in rice. Plant Physiol 143:

912-923.

Lemeille S, Rochaix JD (2010) State transitions at the crossroad of thylakoid signalling pathways.

Photosynth Res 106: 33-46.

Majeran W, Friso G, Ponnala L, Connolly B, Huang M, Reidel E, Zhang C, Asakura Y,

Bhuiyan NH, Sun Q, Turgeon R, van Wijk KJ (2010) Structural and metabolic transitions of

C4 leaf development and differentiation defined by microscopy and quantitative proteomics in

maize. Plant Cell 22: 3509-3542.

Majeran W, Zybailov B, Ytterberg AJ, Dunsmore J, Sun Q, van Wijk KJ (2008) Consequences of

C4 differentiation for chloroplast membrane proteomes in maize mesophyll and bundle sheath

cells. Mol Cell Proteomics 7: 1609-1638.

Olinares PD, Ponnola L, van Wijk KJ (2010) Megadalton complexes in the chloroplast stroma of

arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry and

hierarchical clustering. Mol Cell Proteomics 9.7: 1594-1615.

Peeters N, Small I (2001) Dual targeting to mitochondria and chloroplasts. Biochim Biophys Acta 1541:

54-63.

Peltier JB, Cai Y, Sun Q, Zabrouskov V, Giacomelli L, Rudella A, Ytterberg AJ, Rutschow H,

van Wijk KJ (2006) The Oligomeric Stromal Proteome of Arabidopsis thaliana Chloroplasts.

Mol Cell Proteomics 5: 114-133.

Peltier JB, Friso G, Kalume DE, Roepstorff P, Nilsson F, Adamska I, van Wijk KJ (2000)

Proteomics of the Chloroplast. Systematic identification and targeting analysis of lumenal and

peripheral thylakoid proteins. Plant Cell 12: 319-342.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

19

Pfalz J, Liere K, Kandlbinder A, Dietz KJ, Oelmuller R (2006) pTAC2, -6, and -12 are components

of the transcriptionally active plastid chromosome that are required for plastid gene expression.

Plant Cell 18: 176-197.

Reiland S, Messerli G, Baerenfaller K, Gerrits B, Endler A, Grossmann J, Gruissem W,

Baginsky S (2009) Large-scale Arabidopsis phosphoproteome profiling reveals novel chloroplast

kinase substrates and phosphorylation networks. Plant Physiol 150: 889-903.

Schubert M, Petersson UA, Haas BJ, Funk C, Schröder WP, Kieselbach T (2002) Proteome map

of the chloroplast lumen of Arabidopsis thaliana. J Biol Chem 277: 8354-8365.

Schulze WX, Usadel B (2010) Quantitation in mass-spectrometry-based proteomics. Annu Rev Plant

Biol 61: 491-516.

Schwacke R, Fischer K, Ketelsen B, Krupinska K, Krause K (2007) Comparative survey of plastid

and mitochondrial targeting properties of transcription factors in Arabidopsis and rice. Mol Genet

Genomics 277: 631-646.

Skalitzky CA, Martin JR, Harwood JH, Beirne JJ, Adamczyk BJ, Heck GR, Cline K,

Fernandez DE (2011) Plastids contain a second sec translocase system with essential functions.

Plant Physiol 155: 354-369.

van Wijk KJ (2000) Proteomics of the chloroplast: experimentation and prediction. Trends Plant Sci 5:

420-425.

Villarejo A, Buren S, Larsson S, Dejardin A, Monne M, Rudhe C, Karlsson J, Jansson S,

Lerouge P, Rolland N, von Heijne G, Grebe M, Bako L, Samuelsson G (2005) Evidence

for a protein transported through the secretory pathway en route to the higher plant chloroplast.

Nat Cell Biol 7: 1224-1231.

Walther TC, Mann M (2010) Mass spectrometry-based proteomics in cell biology. J Cell Biol 190: 491-

500.

Whitelegge JP (2004) Mass spectrometry for high throughput quantitative proteomics in plant research:

lessons from thylakoid membranes. Plant Physiol Biochem 42: 919-927.

Wienkoop S, Weiss J, May P, Kempa S, Irgang S, Recuenco-Munoz L, Pietzke M, Schwemmer

T, Rupprecht J, Egelhofer V, Weckwerth W (2010) Targeted proteomics for Chlamydomonas

reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic

flux analyses. Mol Biosyst 6: 1018-1031.

Wise RR (2006) The diversity of plastid form and function. In RRaH Wise, J.K., ed, The structure and

function of plastids, Vol 23. Springer, Dordrecht

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

20

BOX I. Examples of the wide range of queries that a high quality plastid protein atlas should

ultimately be able to answer. Current answers for Arabidopsis proteins are provided with reference to

the databases and resources listed in Table 1. Information for maize and rice is available in a subset of the

databases (see Table 1). This BOX also serves to better identify lack of information and challenges for

plastid proteome research and resource development for the immediate future.

Query 1: Is a protein located in the plastid (chloroplast) in Arabidopsis and what is the experimental

evidence? Search with AGI accession number in TAIR, PPDB, AT_Chloro, plprot; alternatively set up a

query in SUBA and make your own judgment. In case of PPDB, experimental evidence provided are in-

house proteome experiments (eg leaves, chloroplast fractions), detection in public proteomics studies

(displayed) or original literature (displayed for some), and for AT_Chloro provided evidence are the

number of matched spectra in plastid preparations and information from TAIR and PPDB.

Query 2: Which proteins are located in the plastid (chloroplast), and what is the experimental evidence?

Go to PPDB, AT_Chloro or plprot and download lists (tables) with assigned plastid proteins; alternatively

set up a query in SUBA and design your own criteria. In case of AT_Chloro and PPDB, you can narrow

down your search to subchloroplast locations, detection in different plastid types can be assessed in

plprot.

Query 3: Which proteins are predicted to locate in the plastid (chloroplast) and what is the p-value and/or

FDR? Either go to the website of different subcellular localization predictors and extract the predicted list

of At plastid proteins if available. For TargetP prediction, you can also go to PPDB and extract these

accessions with other types of information, such as function. P-values are provided by some predictors

but may not be very reliable, whereas FDRs are mostly self-reported based on test sets. Cross check

prediction with available proteome information (see above).

Query 4: Does a plastid protein have PTMs, on which residue, is there more than 1 gene model and what

is the experimental evidence? Search PhosPhat or RIPP databases for phosphorylation sites and proteins.

For other PTMs, search PPDB either by accession, or alternatively, by PTM. Peptide-based support for

gene models are displayed in both AtProteome (At) and PPDB (At and Zm) for in-house detected

proteins.

Query 5: What is the relative abundance of a plastid protein and how does it change in response to

(a)biotic stress or developmental state or genetic background? This is currently still a difficult question

and answers are best obtained in the individual studies. However, there is a reasonable positive

correlation between the frequency and/or number of spectra matched to a protein in AtProteome, PPDB or

ProMex, or AT_Chloro with abundance; thus comparing spectral counts allows ranking proteins by their

abundance; AtProteome provides spectral count information about distribution across different organs.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.

21

Query 6: Is a plastid protein in a complex and what is the composition of the complex; what is the

experimental evidence? It is currently not possible to get a direct answer to this question from the

databases listed in Table 1.

www.plantphysiol.orgon April 13, 2019 - Published by Downloaded from Copyright © 2011 American Society of Plant Biologists. All rights reserved.