00003 jc silva 2006 mcp v5n4p589

19
Simultaneous Qualitative and Quantitative Analysis of the Escherichia coli Proteome A SWEET TALE* S Jeffrey C. Silva‡§, Richard Denny¶, Craig Dorschel‡, Marc V. Gorenstein‡, Guo-Zhong Li‡, Keith Richardson¶, Daniel Wall, and Scott J. Geromanos‡ We describe a novel LCMS approach to the relative quan- titation and simultaneous identification of proteins within the complex milieu of unfractionated Escherichia coli. This label-free, LCMS acquisition method observes all detectable, eluting peptides and their corresponding frag- ment ions. Postacquisition data analysis methods extract both the chromatographic and the mass spectrometric information on the tryptic peptides to provide time-re- solved, accurate mass measurements, which are subse- quently used for quantitation and identification of constit- uent proteins. The response of E. coli to carbon source variation is well understood, and it is thus commonly used as a model biological system when validating an analytical method. Using this LCMS approach, we characterized proteins isolated from E. coli grown in glucose, lactose, and acetate. The change in relative abundance of the corresponding proteins was measured from peptides common to both conditions. Protein identities were also determined for those peptides that were unique to each condition, and these identities were found to be consist- ent with the underlying biochemical restrictions imposed by the growth conditions. The relative change in abun- dance of the characterized proteins ranged from 0.1- to 90-fold among the three binary comparisons. The overall coverage of the characterized proteins ranged from 10 to 80%, consisting of one to 34 peptides per protein. The quantitative results obtained from our study were compa- rable to other existing proteomic and transcriptional pro- filing approaches. This study illustrates the robustness of this novel LCMS approach for the simultaneous quantita- tive and comprehensive qualitative analysis of proteins in complex mixtures. Molecular & Cellular Proteomics 5: 589 – 607, 2006. Escherichia coli is a microbial symbiote found in the colon and large intestine of most warm blooded animals that plays a critical role in vertebrate anabolism and catabolism. The environment in which E. coli lives is subject to rapid changes in the availability of the carbon and nitrogen compounds necessary to provide its energy and primary building blocks. E. coli survival hinges on the ability to successfully control the expression of genes coding for enzymes and proteins re- quired for growth in response to environmental changes. Be- cause of its simple cellular structure and its relative ease of maintenance and manipulation in the laboratory, E. coli has become the “workhorse host” for most research in molecular biology and microbiology. As a result, it is regarded as one of the most completely characterized organisms in all biology. The ease with which recombinant proteins can be expressed in E. coli has made this bacterium useful in the study of many basic biological processes as well as in the production of heterologous proteins for research and therapeutic purposes. For these reasons, E. coli has become a model system for testing new analytical technologies. For example, the rela- tively small genome size and prevalent laboratory use made E. coli genome one of the first to be completely sequenced (1). Likewise E. coli genome microarrays were among the first to be commercially available with sequences for the complete set of open reading frames as well as intergenic regions (2). The origins of proteomics can also be traced back to E. coli when pioneering two-dimensional gel electrophoresis exper- iments enabled the investigation of proteins on an organism- wide scale (3). Resources such as CyberCell Database (4) and EchoBASE (5) have been designed as central repositories of biochemical and genetic data from E. coli generated by a wide range of sources. These databases are periodically up- dated and annotated to facilitate a comprehensive under- standing of this model organism. The knowledge gained through this organized effort can be applied to the under- standing of other organisms for the development of antibiotics and/or antifungal agents. The availability of fully sequenced genomes has allowed construction of microarrays that are used to detect and quan- tify all postulated gene products by determining the levels of the corresponding transcribed mRNA. A study by Zimmer et al. (6) demonstrated the use of this method to identify those genes in E. coli whose expression is activated when replacing a preferred nitrogen source with a non-preferred nitrogen source. In a separate study, Oh et al. (7) performed a similar analysis where E. coli were grown on different carbon From the ‡Waters Corporation, Milford, Massachusetts 01757- 3696, ¶Waters Corporation, Atlas Park, Simons Way, M22 5PP Manchester, Great Britain, and Novartis Institutes for BioMedical Research, Inc., Cambridge, Massachusetts 02139 Received, September 28, 2005, and in revised form, December 12, 2005 Published, MCP Papers in Press, January 5, 2006, DOI 10.1074/ mcp.M500321-MCP200 Research © 2006 by The American Society for Biochemistry and Molecular Biology, Inc. Molecular & Cellular Proteomics 5.4 589 This paper is available on line at http://www.mcponline.org at CELL SIGNALING TECHNOLOGY on May 22, 2007 www.mcponline.org Downloaded from /DC1 http://www.mcponline.org/cgi/content/full/M500321-MCP200 Supplemental Material can be found at:

Upload: jcruzsilva

Post on 11-May-2015

973 views

Category:

Documents


3 download

DESCRIPTION

Profiling proteins in microbial systems/E. coli

TRANSCRIPT

Page 1: 00003 Jc Silva 2006 Mcp V5n4p589

Simultaneous Qualitative and QuantitativeAnalysis of the Escherichia coli ProteomeA SWEET TALE*□S

Jeffrey C. Silva‡§, Richard Denny¶, Craig Dorschel‡, Marc V. Gorenstein‡,Guo-Zhong Li‡, Keith Richardson¶, Daniel Wall�, and Scott J. Geromanos‡

We describe a novel LCMS approach to the relative quan-titation and simultaneous identification of proteins withinthe complex milieu of unfractionated Escherichia coli.This label-free, LCMS acquisition method observes alldetectable, eluting peptides and their corresponding frag-ment ions. Postacquisition data analysis methods extractboth the chromatographic and the mass spectrometricinformation on the tryptic peptides to provide time-re-solved, accurate mass measurements, which are subse-quently used for quantitation and identification of constit-uent proteins. The response of E. coli to carbon sourcevariation is well understood, and it is thus commonly usedas a model biological system when validating an analyticalmethod. Using this LCMS approach, we characterizedproteins isolated from E. coli grown in glucose, lactose,and acetate. The change in relative abundance of thecorresponding proteins was measured from peptidescommon to both conditions. Protein identities were alsodetermined for those peptides that were unique to eachcondition, and these identities were found to be consist-ent with the underlying biochemical restrictions imposedby the growth conditions. The relative change in abun-dance of the characterized proteins ranged from 0.1- to90-fold among the three binary comparisons. The overallcoverage of the characterized proteins ranged from 10 to80%, consisting of one to 34 peptides per protein. Thequantitative results obtained from our study were compa-rable to other existing proteomic and transcriptional pro-filing approaches. This study illustrates the robustness ofthis novel LCMS approach for the simultaneous quantita-tive and comprehensive qualitative analysis of proteins incomplex mixtures. Molecular & Cellular Proteomics 5:589–607, 2006.

Escherichia coli is a microbial symbiote found in the colonand large intestine of most warm blooded animals that playsa critical role in vertebrate anabolism and catabolism. The

environment in which E. coli lives is subject to rapid changesin the availability of the carbon and nitrogen compoundsnecessary to provide its energy and primary building blocks.E. coli survival hinges on the ability to successfully control theexpression of genes coding for enzymes and proteins re-quired for growth in response to environmental changes. Be-cause of its simple cellular structure and its relative ease ofmaintenance and manipulation in the laboratory, E. coli hasbecome the “workhorse host” for most research in molecularbiology and microbiology. As a result, it is regarded as one ofthe most completely characterized organisms in all biology.The ease with which recombinant proteins can be expressedin E. coli has made this bacterium useful in the study of manybasic biological processes as well as in the production ofheterologous proteins for research and therapeutic purposes.For these reasons, E. coli has become a model system fortesting new analytical technologies. For example, the rela-tively small genome size and prevalent laboratory use madeE. coli genome one of the first to be completely sequenced(1). Likewise E. coli genome microarrays were among the firstto be commercially available with sequences for the completeset of open reading frames as well as intergenic regions (2).The origins of proteomics can also be traced back to E. coliwhen pioneering two-dimensional gel electrophoresis exper-iments enabled the investigation of proteins on an organism-wide scale (3). Resources such as CyberCell Database (4) andEchoBASE (5) have been designed as central repositories ofbiochemical and genetic data from E. coli generated by awide range of sources. These databases are periodically up-dated and annotated to facilitate a comprehensive under-standing of this model organism. The knowledge gainedthrough this organized effort can be applied to the under-standing of other organisms for the development of antibioticsand/or antifungal agents.

The availability of fully sequenced genomes has allowedconstruction of microarrays that are used to detect and quan-tify all postulated gene products by determining the levels ofthe corresponding transcribed mRNA. A study by Zimmer etal. (6) demonstrated the use of this method to identify thosegenes in E. coli whose expression is activated when replacinga preferred nitrogen source with a non-preferred nitrogensource. In a separate study, Oh et al. (7) performed a similaranalysis where E. coli were grown on different carbon

From the ‡Waters Corporation, Milford, Massachusetts 01757-3696, ¶Waters Corporation, Atlas Park, Simons Way, M22 5PPManchester, Great Britain, and �Novartis Institutes for BioMedicalResearch, Inc., Cambridge, Massachusetts 02139

Received, September 28, 2005, and in revised form, December 12,2005

Published, MCP Papers in Press, January 5, 2006, DOI 10.1074/mcp.M500321-MCP200

Research

© 2006 by The American Society for Biochemistry and Molecular Biology, Inc. Molecular & Cellular Proteomics 5.4 589This paper is available on line at http://www.mcponline.org

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

/DC1http://www.mcponline.org/cgi/content/full/M500321-MCP200Supplemental Material can be found at:

Page 2: 00003 Jc Silva 2006 Mcp V5n4p589

sources. These studies not only identified those genes knownto be associated with the specified metabolic pathway butalso revealed many genes that had not been linked previouslywith the metabolic pathway under study.

Although the measurement of transcribed mRNA by hybrid-ization techniques has led to the discovery of molecular mark-ers and the elucidation of biologic mechanisms, this tech-nique is not sufficient for the complete characterization ofbiologic systems. The detection of a particular gene productin a microarray experiment does not confirm the presence orabsence of the resulting protein product or related post-translationally modified isoforms. It is also understood thatquantitative differences in the transcript of a particular gene orset of genes may not necessarily correlate with the corre-sponding protein abundance. This failure was illustrated by astudy involving the effect of carbon source perturbation onsteady-state gene expression in Saccharomyces cerevisiae.The authors reported that growing S. cerevisiae on eithergalactose or ethanol resulted in significant differences be-tween the abundance ratio of the mRNA and the correspond-ing protein products (8). Several other studies have demon-strated the poor correlation between the relative abundanceof a transcript and the corresponding protein (9, 10). To fullyunderstand the cellular physiology of a particular organism ordisease state, a comprehensive analytical survey of the cellmust be completed. The information gathered by compilingdata gained from multiple bioanalytical approaches (i.e. tran-script, protein, and metabolite levels to name a few) on anorganism in a variety of physiological states is the basis of thediscovery science approach referred to as systems biology.Combining data for such a systems analysis leads to a degreeof understanding in which “the whole is greater than the sumof the parts.”

Many studies involving analysis of complex protein mix-tures have been accomplished by combining the well estab-lished separation capabilities of two-dimensional (2D)1 PAGEwith mass spectrometry-based sequence identification of se-lected, semipurified proteins (11). Although this technique isoften applicable to comparative proteomics, 2D PAGE is no-toriously insensitive to proteins that are not soluble during theisoelectric focusing stage of the separation. Moreover thestaining methods required to visualize the proteins imposerestraints on dynamic range and detection limits. Despitetwo-dimensional separation of the intact proteins, individualgel spots often contain many proteins, affecting the resultingquantitative analysis. This problem is exacerbated by the

varying degrees of post-translational modifications that a par-ticular protein may undergo, resulting in protein componentsappearing in multiple locations on the two-dimensional image.The development of automated, data-dependent ESI MS/MSin conjunction with microcapillary LC and database searchinghas significantly increased the sensitivity and speed of iden-tification of gel-separated proteins. Alternative methods havesubsequently been developed to maximize the duty cycle ofthe mass spectrometer with a concomitant increase in sensi-tivity that use a parallel (“broad band” acquisition) rather thana serial approach for the collision-induced dissociation ofpeptides (12–14). This method enhances the run-to-run repro-ducibility and yields high mass accuracy for both intact pep-tides and fragments, thereby improving sensitivity.

A traditional approach to determine the relative quantities ofpeptides (or proteins) in a complex mixture involves usingstable isotope-labeled peptides. This technique allows directcorrelation of the naturally occurring peptide to its stableisotope-labeled analog (15–17). In these studies, an aminoacid labeling strategy is incorporated into the protocol inwhich one of the biological samples is treated with the lightisotope form of the chemical labeling reagent, and the othersample is treated with the heavy isotope form of the labelingreagent. When the samples are mixed and analyzed by LCMS,labeled peptide pairs from the two samples can be differen-tiated in the mass spectrometer by the virtue of their massdifference. The ratio of the signal intensities of the light toheavy peptide derivatives of the peptide pairs reflects theabundance ratio for the originating protein in the two differentbiological samples. Although this quantitative strategy is auseful method for determining the relative abundance of pro-teins between different samples, it can involve complexchemistry and require expensive reagents, and it is not par-ticularly amenable to large scale relative quantitation studies.

In this study, we used a simple, gel-free, label-free LCMSapproach for qualitative and quantitative proteomic analysis(13). This investigation involved the study of E. coli grown withsingle, specific carbohydrates. This approach provides anexcellent model system to study subtle differences in themicrobial proteome because there is a controlled environmentin which only one parameter is varied. Using E. coli to betterunderstand metabolic pathways and characterize previouslyunknown proteins helps validate this methodology and couldlead to the discovery of novel antibiotics when applied to relatedvirulent microbes. The results of this study correlate well withthe known carbon source biochemistry and molecular biologyof E. coli. The ease of use and efficiency of this new techniqueis demonstrated by the comparability of the results with thoseobtained from existing gene profiling and more traditionallyobtained proteomic data available in the literature (7, 18).

MATERIALS AND METHODS

Media and Growth Conditions—Frozen E. coli (ATCC10798, K-12)cell stocks were streaked onto Luria-Bertani (LB) plates and grown at

1 The abbreviations used are: 2D, two-dimensional; MSE, elevatedenergy MS; PLGS, ProteinLynx Global Server; RSD, relative standarddeviation; AMRT, accurate mass, retention time detection; PMF, pep-tide mass fingerprint; DGAL, D-galactose-binding periplasmic protein;MDH, malate dehydrogenase; SIC, selected ion chromatogram; CI,confidence interval; ACS, acetyl-CoA synthetase; PGI, phosphoglu-cose isomerase; 2DGE, 2D gel electrophoresis; BPI, base peakintensity.

E. coli Proteome Analysis

590 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 3: 00003 Jc Silva 2006 Mcp V5n4p589

37 °C. An individual colony was subsequently streaked onto M9minimal medium plates supplemented with 0.5% sodium acetate andincubated at 37 °C. Seed cultures were generated by transferringsingle colonies into flasks of M9 minimal medium supplemented with0.5% sodium acetate. Seed culture flasks were shaken at 250 rpm at37 °C until midlog phase (A600 � 0.9–1.1). The seed culture wasdiluted 1 ml to 500 ml into separate M9 minimal media supplementedwith one of three carbon sources (0.5% glucose, 0.5% lactose, or0.5% sodium acetate). Flasks were shaken at 250 rpm at 37 °C untilmidlog phase (A600 � 0.9–1.1) and then harvested by centrifugation(5,000 � g for 15 min). Culture medium was discarded, and the cellswere frozen at �80 °C until needed for protein extract preparation.

Protein Extract Preparation—Frozen cells were suspended in 5 mlof lysis buffer (Dulbecco’s phosphate-buffered saline � 1⁄100 proteaseinhibitor mixture (Sigma catalog number 8340))/1 g of biomass in a50-ml Falcon tube. The cells were lysed by sonication in a MicrosonXL ultrasonic cell disrupter (Misonix, Inc.) at 4 °C. The cell debris wereremoved by centrifugation at 15,000 � g for 30 min at 4 °C. Theresulting soluble protein extract was dispensed into 1.0-ml cryotubesand stored at �80 °C for subsequent analysis.

SDS-PAGE Analysis of Protein Extracts—Each protein sample wasdenatured and reduced using a standard PAGE loading buffer mixturecontaining 1.0% SDS and 10 mM DTT. The denatured protein sam-ples were run in a Bio-Rad Criterion gel apparatus into a 12% poly-acrylamide gel at 160 V for 1 h. The polyacrylamide gel was stainedwith Coomassie Blue using standard protocols.

Protein Digest Preparation—Approximately 250 �g of total E. coliprotein was suspended in 100 �l of 50 mM ammonium bicarbonate(pH 8.5) containing 0.05% Rapigest (19). Protein was reduced in thepresence of 10 mM dithiothreitol at 60 °C for 30 min. The protein wasalkylated in the dark in the presence of 30 mM iodoacetamide at roomtemperature for 30 min. Proteolytic digestion was initiated by addingmodified trypsin (Promega) at a concentration of 50:1 (E. coli proteinto trypsin) and incubated at 37 °C overnight. Tryptic digestion wasterminated by diluting 1:1 with water and freezing immediately at�80 °C. The tryptic peptide solution (1.25 �g/�l total protein) wascentrifuged at 10,000 � g for 10 min, and the supernatant wastransferred into an autosampler vial for peptide analysis via LCMS.

HPLC Configuration—Capillary liquid chromatography of trypticpeptides was performed with a Waters CapLC system equipped witha Waters NanoEaseTM AtlantisTM C18, 300-�m � 15-cm reversephase column. The aqueous mobile phase (mobile phase A) con-tained 1% acetonitrile in 0.1% formic acid. The organic mobile phase(mobile phase B) contained 80% acetonitrile in 0.1% formic acid.Samples (5-�l injection, digested equivalent to 6.25 �g of total pro-tein) were loaded onto the column with 6% mobile phase B. Peptideswere eluted from the column with a gradient of 6–40% mobile phaseB over 100 min at 4.4 �l/min followed by a 10-min rinse of 99% ofmobile phase B. The column was immediately re-equilibrated at initialconditions (6% mobile phase B) for 20 min. The lock mass,[Glu1]fibrinopeptide at 100 fmol/�l, was delivered from the auxiliarypump of the CapLC system at 1 �l/min to the reference sprayer of theNanoLockSprayTM source. All samples were analyzed in triplicate.

Mass Spectrometer Configuration—Mass spectrometry analysis oftryptic peptides was performed using a Waters/Micromass Q-TOFUltima API system. For all measurements, the mass spectrometerwas operated in V-mode with typical resolving power of at least10,000. All analyses were performed using positive mode ESI using aNanoLockSpray source. The lock mass channel was sampled every30 s. The mass spectrometer was calibrated with a [Glu1]fibrino-peptide solution (100 fmol/�l) delivered through the reference sprayerof the NanoLockSpray source. Accurate mass LCMS data were col-lected in an alternating, low energy (MS) and elevated energy (MSE)mode of acquisition. The spectral acquisition time in each mode was

1.85 s with a 0.15-s interscan delay. In low energy MS mode, datawere collected at a constant collision energy of 10 eV. In MSE mode,collision energy was ramped from 28 to 35 eV during each 1.85-s datacollection cycle. One cycle of MS and MSE data was acquired every4.0 s. The radio frequency applied to the quadrupole mass analyzerwas adjusted such that ions from m/z 300 to 2000 were efficientlytransmitted, ensuring that any ions observed in the LC/MSE data lessthan m/z 300 were known to arise from dissociations in the collisioncell.

Data Processing and Protein Identification—The continuum LCMSE

data were processed and searched using ProteinLynx Global Server(PLGS) version 2.2. The resulting peptide and protein identificationswere evaluated by the software using statistical models similar to thosedescribed by Skilling et al. (20). Results from replicate injections werecollated for quantitative analysis to determine the relative -fold changeusing the glucose condition as the control experiment. Protein identifi-cations were assigned by searching an E. coli protein database usingthe precursor and fragmentation data afforded by the LCMS acquisitionmethod. The search parameter values for each precursor and associ-ated fragment ion were set by the software using the measured masserror and intensity error obtained from processing the raw continuumdata. The mass error tolerance values were typically under 5 ppm.Peptide identifications were restricted to tryptic peptides with no morethan one missed cleavage and cysteine carbamidomethylation. The iondetection, clustering, and normalization were processed using PLGS asdescribed earlier (13). Additional data analysis was performed withSpotfire Decision Site 7.2 and Microsoft Excel.

Due to the nature of the alternate scanning acquisition method,fragment ions produced from any given precursor will have the samechromatographic profile and apex retention time as the originatingprecursor ion. The data processing software produces an inventory ofthe measured monoisotopic mass of each detected precursor andfragment ion. The chromatographic peak area, chromatographic peakshape, combined charge state, and the apex retention time are alsoprovided for each corresponding precursor and fragment ion. Thechromatographic peak area is determined from the combined inten-sity of all the isotopes for all of the charge states associated to eachprecursor. Fragment ions are assigned to a parent precursor only iftheir apex retention times are within plus or minus the time associatedwith one acquisition scan (i.e. alternate scanning cycle). In theseexperiments, because the alternate scanning cycle time was 2 s, theions found in the elevated energy channel to within �0.05 min of agiven precursor were assigned as associated fragments.

A qualitative analysis of a protein mixture may produce instanceswhere more than one precursor ion can be found at the same apexretention time. In this instance, the fragmentation data associatedwith a specific moment in time is shared among more than oneco-eluting precursor ion; however, it is important to remember thatprecursor and fragment ion data are acquired at high mass accuracy(�5 ppm). At 5-ppm mass accuracy, there is enough mass specificityto resolve associated fragment ions with their appropriate precursorion for a subsequent accurate mass stringent database search. Withthis level of mass accuracy and the ability to obtain time-resolvedmass measurements, confident identifications can be made in theinstances of co-eluting peptides.

In instances where data from multiple injections of the same sam-ple have been collected, the methodology utilizes chromatographicand analytical reproducibility to help confidently assign fragments toco-eluting precursors. A more thorough description of how the algo-rithms are used to “clean” the fragmentation data will be described infuture work.2

2 J. C. Silva, C. Dorschel, M. V. Gorenstein, G.-Z. Li, and S. J.Geromanos, manuscript in preparation.

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 591

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 4: 00003 Jc Silva 2006 Mcp V5n4p589

Peptide Clustering and Data Normalization—Identical peptidesfrom each of the replicate injections for all conditions were clusteredby mass precision (typically �10 ppm) and a retention time tolerance(typically �0.25 min) using the PLGS clustering software. The clus-tered peptide data set was exported from PLGS and further evaluatedwith Excel and Spotfire. For each condition, those ion detections thatoccurred only in one of the three replicate injections were consideredas noise and discarded from further analysis. The LCMS data werenormalized to peptides originating from TUFA prior to determining therelative quantitation of identified proteins across the various condi-tions. The details regarding the normalization strategy is describedlater in greater detail under “Results and Discussion.”

RESULTS AND DISCUSSION

A standard PAGE analysis was performed on the solubleprotein extracts from E. coli grown on three different carbonsources. The protein loading was controlled to ensure that anequal amount of total protein from each condition was appliedonto the gel. Two aliquots of total protein were loaded toobtain better resolution of the most abundant proteins. Theprotein profile patterns illustrated in Fig. 1A reveal similarpatterns for the glucose and lactose growth conditions but adistinct pattern difference between the acetate growth con-dition and the other two growth conditions.

The soluble protein extracts from each of the three growthconditions were treated with trypsin, and the resulting peptidemixtures were analyzed by LCMSE. Fig. 1B illustrates the BPIand the total ion chromatograms from the low energy andelevated energy MS data acquisition for each condition, re-spectively. Inspection of the BPI chromatograms indicatesthat the similarity between the glucose and lactose conditionsis also observed at the level of the tryptic peptides. In addi-tion, the distinction between the acetate condition and theother two carbon sources is evident. The LCMSE data fromeach condition were processed using the Protein Expressionsoftware to produce an inventory of peptides that can be usedto determine the relative abundance of peptides/proteinsacross multiple conditions. The complexity of the samples areillustrated in Fig. 1C, which displays �8000 observed mo-noisotopic masses for each of the extracted peptide compo-nents (MH�) as a function of the observed retention time foreach condition. The alternate scanning mode of the LCMSdata acquisition is configured to detect the precursor pep-tides in the low energy channel while simultaneously obtainingthe data from associated fragments for subsequent structuraldetermination of each precursor.

Ion Detection—Replicate injections of tryptic peptides fromsoluble E. coli protein preparations were processed with theProtein Expression software, creating an inventory of the pep-tides obtained from the low energy data acquisition for eachgrowth condition. Table I lists the number of peptide detec-tions and summed ion intensities for each injection of theacetate, lactose, and glucose growth conditions. An averageof 8102, 7959, and 8437 peptides were found in the acetate,lactose, and glucose growth conditions, respectively. Therelative standard deviations (RSDs) for the number of peptide

detections and the intensity sums ranged from 1.3 to 8.5%and from 3.1 to 5.7%, respectively. These RSDs indicate anacceptable degree of reproducibility of the data for the repli-cate injections of the three different growth conditions.

As a quality control measure, an external standard of afive-protein mixture (bovine albumin, bovine hemoglobin,yeast enolase, yeast alcohol dehydrogenase, and rabbitphosphorylase B) was analyzed at the beginning, in the mid-dle, and at the end of the E. coli sample analysis. Theseproteins were injected at �750 fmol. The standard proteinmixture is incorporated as a performance check during thedata acquisition portion of the study to help verify that the LCand MS systems are performing within acceptable specifica-tions. Significant deviations in retention time reproducibility,signal intensity, mass resolution, and mass accuracy indicatethat the LCMS system should be inspected for faults. Re-peated injections of this standard protein mixture have setexpected criteria for the mass spectrometer and the ion de-tection software. In a single injection of this simple proteinmixture, �240 accurate mass, retention time detections (AM-RTs) are obtained. Approximately 75% of the 240 time-re-solved mass measurements can be assigned to tryptic frag-ments of the five standard proteins, constituting �85% of thetotal detected intensity. These identifications do not take intoaccount any post-translational modifications that may bepresent other than carbamidomethyl cysteine and may ac-count for a few of the unidentified AMRTs. Upon analysis of alower level of the protein standard (�5-fold dilution), 40 AM-RTs are observed from the peak detection software, 36 ofwhich are a subset of the 240 AMRTs found in the higherconcentration standard protein mixture. Approximately 77%of the 40 time-resolved mass measurements in the lowerconcentration sample can be assigned to tryptic fragments ofthe five standard proteins, constituting �85% of the totaldetected intensity. By extension, it may be assumed that asimilar plurality of ions detected from any mixture of trypticpeptides represents actual peptide detections. The statisticsassociated with the accounting of the peptide detections (“ionaccounting”) observed from the analysis of the protein stand-ard described above illustrates the robustness of the peakdetection software and the lack of carryover from previousinjections. The list of precursors from the standard proteinmixture has been provided in the supplemental data (Supple-mental Tables 1 and 2).

The peak detection software is designed to interrogate boththe low and elevated energy channels and produce an ion listfor only those m/z detections that produce a chromatographicapex. Any m/z detections that occur at a constant back-ground level (i.e. solvent-related ions or column matrix-asso-ciated ions) do not appear in the resulting ion lists and willtherefore not interfere with subsequent protein identificationand/or quantitation.

Clustering Peptide Components across Multiple Condi-tions—After obtaining the inventory of the detected peptides

E. coli Proteome Analysis

592 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 5: 00003 Jc Silva 2006 Mcp V5n4p589

from the replicates of each growth condition, the individualpeptide lists from each of the E. coli samples were organizedinto a single matrix such that identical peptides were groupedacross the entire experiment (replicate injections of multipleconditions) for subsequent quantitative analysis. The cluster-ing algorithm utilizes the mass precision of the mass spec-

trometer and retention time reproducibility obtained from thechromatography to cluster the identical peptides across theentire experiment. The details of the clustering algorithm canbe found in the work of Silva et al. (13).

After clustering the peptides from the entire study, thereplication of each peptide within each condition was deter-

FIG. 1. An overview of the analysis of E. coli. A, an SDS-PAGE analysis of the soluble protein generated from E. coli grown in minimal mediawith acetate (ACE), lactose (LAC), and glucose (GLU). Lane 4 contains the following protein molecular mass markers: 250, 150, 75, 50, 40, 25,20, 10, and 5 kDa. Lanes 1–3 illustrate 7.5 �l of the whole-cell protein extract directly after sonication. Lanes 5–7 and lanes 8–10 illustrate 7.5and 2.5 �l of soluble protein after removing the cell debris by centrifugation, respectively. B, the BPI of a single alternate scanning LCMSacquisition (LCMSE) of E. coli from each growth condition. Each LCMSE experiment contains a low energy (LE) function for the intact peptidesand an elevated energy (EE) function for the associated fragment ions. C, an overlay of the deisotoped and charge state-reduced monoisotopicmass (0–5000 amu) and apex retention time (10–100 min) of the extracted peptides obtained from the acetate (yellow), lactose (red), andglucose (blue) experiments. The average monoisotopic mass and retention time are plotted for the replicate analysis of each condition. D, anoverlay plot of the monoisotopic mass (1450–1675 amu) and apex retention time (29.5–32.0 min) of the extracted peptides from each conditionrepresented within the gray box in Fig. 4C.

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 593

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 6: 00003 Jc Silva 2006 Mcp V5n4p589

mined. Those peptides that only occurred once in each rep-licate analysis set (one of three) were regarded as backgroundions and were removed from consideration. Only those pep-tides that were found in at least two of the three injectionswere used for analysis. For the acetate condition, the dis-carded components corresponded to approximately 19% ofthe total ion detections (peptides) but only represented 4% ofthe total detected intensity from the acetate condition (TableII). These statistics are consistent with the notion that thesediscarded peptides are among the low intensity detectionsthat occur at nearly the limit of detection of the instrument.Because the peptides observed at this detection range aremore likely to cause spurious quantitative results and provideless structural information, they are discarded from furtheranalysis. Keeping in mind that this particular analysis is not

dependent upon a peptide enrichment strategy, it is unlikelythat we are losing important qualitative information becausethere are many tryptic peptides available for the subsequentidentification of the constituent proteins. Similar results wereobserved for the other two growth conditions.

Data Normalization—Normalization of the data is requiredfor meaningful quantitative results. This can be accomplishedin a variety of ways. In instances where not many proteins areaffected by a given perturbation, an autonormalization routineis an appropriate means of normalizing the data across manydifferent samples. In this type of normalization routine thedata are normalized to the intensity of the many qualitativelymatched proteins (or peptides) that are found through statis-tical analysis not to change between the two conditions.However, in instances where there are dramatic qualitative

TABLE IISummary of the replication of detected peptides from E. coli grown in acetate, lactose, and glucose

Number ofdetections

Fraction oftotal detections

Total detectedintensity

Fraction of totaldetected intensity

% %

AcetateReplication

One out of three injections 4,675 19.2 1.4438E�07 3.9Two out of three injections 5,328 21.9 2.3479E�07 6.3Three out of three injections 14,304 58.8 3.3480E�08 89.8

Total detections/intensity 24,307 3.7272E�08In at least two out of three injections 19,632 80.8 3.5828E�08 96.1

LactoseReplication

One out of three injections 4,795 20.1 1.3788E�07 4.6Two out of three injections 5,226 21.9 2.0368E�07 6.8Three out of three injections 13,857 58.0 2.6477E�08 88.6

Total detections/intensity 23,878 2.9893E�08In at least two out of three injections 19,083 79.9 2.8514E�08 95.4

GlucoseReplication

One out of three injections 5,117 21.4 1.8925E�07 3.7Two out of three injections 5,891 24.7 3.2173E�07 6.3Three out of three injections 14,614 61.2 4.5989E�08 90.0

Total detections/intensity 25,622 5.1099E�08In at least two out of three injections 20,505 85.9 4.9206E�08 96.3

TABLE ISummary of extracted peptides from E. coli grown in acetate, lactose, and glucose

InjectionAverage CVa

1 2 3

%

AcetatePeptide detections 8218 8083 8006 8102 1.3Summed intensity 1.2218E�08 1.2182E�08 1.2872E�08 1.2424E�08 3.1

LactosePeptide detections 8357 7174 8347 7959 8.5Summed intensity 9.9716E�07 9.3929E�07 1.0528E�08 9.9642E�07 5.7

GlucosePeptide detections 8561 8634 8115 8437 3.3Summed intensity 1.9119E�08 1.7317E�08 1.8895E�08 1.8444E�08 5.3

a Coefficient of variation.

E. coli Proteome Analysis

594 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 7: 00003 Jc Silva 2006 Mcp V5n4p589

and quantitative changes such as those observed betweenglucose and acetate or lactose and acetate, it may not be thebest normalization strategy. The dramatic changes due to thevarious conditions are illustrated later in Fig. 3 (D, F, and H).Comparing the glucose and lactose conditions, the histogramof the intensity ratios of the matched peptides indicates thatnot many peptides, or originating proteins, change betweenthe two conditions. Approximately 4000 AMRTs are foundwithin the center four bins of the histogram (Fig. 3D). How-ever, in the case of either acetate versus glucose or acetateversus lactose, there are far fewer AMRTS (�800, Fig. 3, F andH) that are found within the center four bins of the histogram,indicating that many proteins are changing between the twoconditions. Given the dramatic changes observed among thethree different growth conditions, we opted to normalize to asingle protein that did not change among the three conditionsprior to determining the relative protein changes among thedifferent conditions. Considering the apparent consistency ofthe peptide levels of protein chain elongation factor Ef-Tu(TUFA) in the three samples and the substantial number ofidentified peptides to the protein (�60% sequence coverage),it was selected as the target protein for normalization acrossthe three different conditions. Before normalizing the samplesacross the entire set of experiments, the intensity measure-ments from the raw data indicated that the relative intensityratios of the TUFA peptides varied by less than 30%. Afternormalization, this variability was reduced to below 20%. Allthe observed intensity measurements were scaled to thesummed intensity of the TUFA peptides found to be commonto each condition. Using this normalization strategy, we wereable to correct for injection variability within each conditionand also for variation in protein load among all conditions. Themonoisotopic masses and retention times of the peptidesused for normalization were: 1027.5585 (37.21 min),1171.6598 (41.97 min), 1187.5300 (33.78 min), 1214.6366(47.77 min), 1303.7826 (70.87 min), 1780.9388 (64.19 min),1795.9577 (13.71 min), 1803.8827 (32.27 min), 1964.9721(63.57 min), and 2117.1521 (91.80 min). The validated massspectrum for three of these peptides is provided in the sup-plemental data (Supplemental Fig. 1).

Analytical Reproducibility—Before conducting the relativeprotein profiling analysis among the three different growth

FIG. 2. Analytical statistics from the replicate LCMSE analysis ofE. coli grown in acetate. A, a histogram plot of the mass precision ofthe clustered peptide components from the replicate LCMSE analysisof the tryptic peptides generated from the soluble protein of E. coli

grown in minimal medium with acetate as the sole carbon source. B,a histogram plot of the relative standard deviation of the measuredsignal intensity (coefficient of variation of replicate intensity measure-ments) for those clustered peptide components that replicated in atleast two of the three injections of the tryptic peptides generated fromthe soluble protein of E. coli grown in minimal medium with acetate asthe sole carbon source. C, a histogram plot of the relative standarddeviation of the measured retention time obtained from the clusteredpeptide components that replicated in at least two of the three injec-tions of the tryptic peptides generated from the soluble protein ofE. coli grown in minimal medium with acetate as the sole carbonsource.

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 595

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 8: 00003 Jc Silva 2006 Mcp V5n4p589

FIG. 3. Differential protein expression of E. coli grown on glucose, lactose, and acetate from replicate LCMSE experiments. Scatterplots (A, C, E, and G) and corresponding histogram plots (B, D, F, and H) of the natural logarithm of the average intensity ratios of matchingpeptides among the three different growth conditions. Multiple peptides corresponding to a subset of identified proteins are highlighted in the

E. coli Proteome Analysis

596 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 9: 00003 Jc Silva 2006 Mcp V5n4p589

conditions, a variety of quality control measures were per-formed on the replicates of each condition to determine theanalytical reproducibility of the analysis. The final results fromthe clustering algorithm were exported from the Protein Ex-

pression software as a comma-delimited text file containingall of the mass spectrometric and chromatographic attributesfor each peptide component along with all the statistical cal-culations generated after the clustering process. This clus-

FIG. 4. Utilization of glucose, lac-tose, and acetate by E. coli. Illustrationof the biochemical pathways and theircorresponding enzymes required for uti-lization of glucose, lactose, and acetateby E. coli. Lactose catabolism includesthose enzymes between lactose and�-D-glucose. Glycolysis includes thoseenzymes between �-D-glucose andacetyl-CoA. The citric acid cycle andglyoxylate shunt include those enzymesafter acetyl-CoA. Acetate utilization in-cludes those enzymes between acetateand acetyl-CoA. The gene/protein nameand Blattner number is provided through-out the biosynthetic scheme.

scatter plot for each binary comparison. The standard deviation (StDev) for the ln(average intensity ratio) of the matched peptides is indicatedin the histogram plot for each corresponding binary comparison. A, the ln(average intensity ratio) of matched peptides between two replicateinjections of E. coli grown in acetate. C, lactose versus glucose: red, LACZ; blue, GALT; yellow, GALD; black, GALM; green, DGAL; and pink,TUFA. E, acetate versus glucose: green, ACEA; yellow, ACS; red, SUCC; blue, SUCD; and black, THRC. G, acetate versus lactose: yellow,ACEA; black, MDH; blue, GALM; and red, GALE. Unique peptides to each condition are indicated at the extreme values in both the scatter plot(top � numerator, bottom � denominator) and histogram (left � numerator, right � denominator) plots. Ace, acetate; Lac, lactose; Glu,glucose.

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 597

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 10: 00003 Jc Silva 2006 Mcp V5n4p589

tered data file was imported into Microsoft Excel to determinea number of data quality control measures and into SpotfireDecision Site to visualize the reproducibility of the analysis ofeach sample.

The parameters used for the clustering of identical peptidesthroughout an experiment rely on the inherent reproducibilityof the instruments used to obtain the data. Specifically theclustering algorithm utilized the analytical reproducibility ofthe mass measurement and the reproducibility associatedwith the chromatographic retention time measurement ofeach peptide. The mass precision error obtained from theextracted peptide components was typically within �5 ppm ofthe mean mass measurement. This is illustrated in Fig. 2A anddemonstrates the robustness of the ion detection softwareand the stability of the mass measurement instrumentation.The variability of the quantitative intensity measurementsamong the replicate injections obtained from the Protein Ex-pression software is summarized in Fig. 2B. These resultsindicate that the average and median RSD among the repli-cate injections was 15.6 and 20.2%, respectively. Fig. 2Cillustrates the reproducibility of retention times during thisstudy where the RSD was typically less than 1.3%. Theseobservations are consistent with Protein Expression resultsreported previously (13, 21). The low analytical variability as-sociated with the replicate injections demonstrates the ro-bustness of the method and provides the measure of confi-dence needed to proceed onto the comparison of the pairedconditions for quantitative protein profiling analysis.

The intensity variation of this analytical method can beassessed by conducting binary comparisons of the intensitymeasurements of the matched peptide components for eachreplicate injection. Fig. 3A illustrates the scatter plot of thebinary comparison of two replicate injections from the acetatecondition. Fig. 3B presents the same data as a histogram plotof the intensity ratios of the matched peptides demonstratingthat the majority of the matched peptides have intensity ratiosclose to unity. Ideally the binary comparisons would lie on ahorizontal line (ln(ratio) � 0) with minimal deviation throughoutthe signal detection range. The data do produce a close fit tothis horizontal line with the smallest deviation betweenmatched peptides of higher abundance. These plots are use-ful because they illustrate what one would expect to see ifthere were no apparent changes between any two conditions.The standard deviation associated with the intensity ratios ofthe matched peptides between two replicate injections fromthe acetate growth condition was determined to be 0.22.Similar standard deviations were observed from replicate in-jections of the other two conditions and are indicative of thevariability associated with the analytical method.

Clustering Peptides by Mass, Retention Time, and -FoldChange—After grouping identical peptides by their observedmass and retention time within a given condition, the peptideintensity ratios from any two conditions can be displayed toreflect the relative quantitative difference (-fold change) ob-

served between the those two conditions. Fig. 3, C, E, and G,display the relative -fold change observed for the matchingpeptide components among the three different growth con-ditions. Given these plots, those peptides whose intensitieschange significantly between conditions can be quickly iden-tified. Fig. 3, C, E, and G, clearly show the large variationbetween the acetate and the other two growth conditions ascompared with the small variation observed between glucoseand lactose. The dramatic effect can be explained by theoverall metabolic adjustments that E. coli must implement toutilize the three different carbon sources (Fig. 4). The similarityin peptide expression levels between glucose and lactose canbe rationalized by the nature of the two carbon sources.Lactose is a disaccharide of glucose and galactose. Growthon lactose requires that E. coli express a series of proteins totransport and hydrolyze the disaccharide to its correspondingmonosaccharides. Further a series of enzyme-catalyzed re-actions are required to activate and epimerize galactose toglucose, a preferred carbon source for E. coli. Glucose iscatabolized through glycolysis and the citric acid cycle toprovide the primary metabolites for essential building blocksand production of energy. Acetate is a simple carbon sourcethat initially bypasses glycolysis and enters into a modifiedversion of the citric acid cycle, the glyoxylate shunt, to providethe necessary primary metabolites and energy to supportgrowth.

The ability to cluster identical peptide components andquantitatively compare them across multiple conditions pro-vides an analytical means to group related peptides within agiven protein profiling experiment. A set of peptides within adetermined -fold change range should originate from a limitedsubset of the proteins in the natural proteome. If the relativeabundance of a protein changes between two conditions, itthen follows that the relative abundance of tryptic peptidesoriginating from the same protein will reflect the same degreeof differential expression between two conditions. This meth-odology does not require isotope-labeled affinity tags or anyother enrichment or labeling strategy. In fact, it is desirable notto enrich for specific peptides because one can take advan-tage of the multiple peptide measurements for any particularprotein when determining its relative abundance. This quan-titative comparison allows one to execute a peptide massfingerprint (PMF) search with a �5-ppm mass tolerance usinga subset of the peptides matched between two conditions forsubsequent peptide/protein identification. The quantitative in-formation provided by this analysis in conjunction with a PMFsearch is a powerful approach to help identify proteins fromsuch complicated LCMS data. Because the data have beenacquired in alternate scanning mode, each tryptic peptide hasassociated fragmentation data that can be used for independ-ent identification and validation of this complementary ap-proach. The details regarding how the identification of pep-tides are generated from the associated fragmentation dataare discussed later in the text.

E. coli Proteome Analysis

598 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 11: 00003 Jc Silva 2006 Mcp V5n4p589

Fig. 3C is a scatter plot of the 5983 matched peptidesbetween the glucose and lactose growth conditions of E. coli.Six sets of tryptic peptides have been highlighted to illustratea subset of the proteins that have been identified by PLGSusing both precursor and fragmentation data afforded by theLCMSE analysis. The standard deviation associated with theintensity ratios, ln(ratio), of the matched peptides between thetwo conditions was shown to be 0.25, slightly higher than thatobserved for two replicate injections of the acetate sample(0.22, Fig. 3A). A total of nine peptides lie within a -fold changerange of 1.04 � ln(ratio) � 1.73. When these peptides aresubmitted for an accurate mass PMF search against the entireE. coli protein database, allowing for one missed cleavage,the results return six peptides (highlighted in green) matchingto DGAL (D-galactose-binding periplasmic protein) within a5-ppm mass error tolerance and providing 29% protein se-quence coverage. Because DGAL is up-regulated in the lac-tose growth condition, one would expect to find additionaltryptic peptides unique to that condition. An additional 13peptides are found to DGAL that are unique to the lactosecondition, increasing the final protein sequence coverage to65%. The alternate scanning data acquisition mode (LCMSE)provides supporting sequence information from the associ-ated fragment ions collected in the elevated energy functionexperiment to provide structural validation of a majority of the19 peptides. The validated mass spectrum for three of thesepeptides is provided in the supplemental data (SupplementalFig. 2).

The data plotted in Fig. 3E indicate a greater degree ofdissimilarity between the two conditions (acetate and glu-cose), reflected in the higher standard deviation value (0.98).Here for example, of the 13 peptides in the range 4.10 �

ln(ratio) � 5.03, there are six peptides that are identified toACEA (isocitrate lyase, 18% protein sequence coverage)within a mass error tolerance of 5 ppm. An additional 15peptides to ACEA were unique to the acetate condition, in-creasing the final protein sequence coverage to 61%. Againthe sequences of the majority of the peptides were validatedby the elevated energy data acquired in the alternate scanningmode (Supplemental Fig. 3).

Fig. 3G illustrates the matched peptides of the acetate andlactose growth conditions and highlights those peptides iden-tified to isocitrate lyase, malate dehydrogenase, UDP-glu-cose-4-epimerase, and aldolase-1-epimerase (ACEA, MDH,GALE, and GALM, respectively). The standard deviation as-sociated with the intensity ratios of the matched peptidesbetween these two conditions was determined to be 1.01,similar to that observed in the comparison of the peptidescommon between the acetate and glucose growth conditions.The treatment of this data is similar to the approach of Con-rads et al. (22); however, the method in this study uses asingle experiment to provide both the qualitative and quanti-tative information for each of the constituent proteins. Havingthe ability to conduct these experiments using one instrument

simplifies the overall strategy and greatly reduces the timeand effort required to collect and manage the digitized sampleinformation. A more thorough discussion of the quantitativeand qualitative results and its correlation to the differentgrowth conditions is discussed later.

Simultaneous Peptide Sequence Identification by LC-MSE—To demonstrate the simultaneous qualitative capabili-ties of the alternate scanning mode of data acquisition, a totalof seven selected ion chromatograms (SICs) from the raw,continuum LCMSE data from a single analysis of the acetatecondition are illustrated in Fig. 5. The top SIC is of the doublycharged precursor m/z 945.664 of the GYINSLGALTG-GQALQQAK peptide (1890.0220 MH�) from ACEA obtainedin the low energy channel (function 1) whose apex retentiontime is 57.14 min. The six remaining SICs correspond tofragments ions of the peptide from ACEA obtained in theelevated energy channel (function 2) during the LCMSE acqui-sition. Specifically these SICs correspond to the y4, y8, y9,y11, y12, and y13 fragments of the ACEA peptide. The chro-matographic profiles of these fragment ions are illustrated inFig. 5 and are all shown to apex (57.17 min) within one scan(0.03 min) of the originating precursor peptide. This demon-strates the basic premise of alternate scanning LCMSE: thechromatographic profiles of product (fragment) ions must ex-actly parallel the profile of the precursor with peak apicesmatching within one MS scan of the originating precursor. TheProtein Expression software converts the continuum LCMSE

data into an inventory of time-resolved mass measurementsof the detected peptides (precursors) from the low energychannel aligned with their corresponding fragment ions in theelevated energy channel. The information provided in the listof peptides includes the deisotoped and charge state-re-duced monoisotopic mass measurement, the correspondingdeconvolved intensity measurement, the measured apex re-tention time, and the average charge state. The time-resolvedmass measurement data obtained in the elevated energychannel associated with the m/z 1890.0220 precursor at�57.14 min can be seen in the lower panel of Fig. 5. Thisillustration shows how the LCMSE method enables one tosimultaneously perform quantitative and qualitative character-ization of detected peptides.

The identification of peptides and proteins is carried outusing a probabilistic peptide fragmentation model in whichthe framework of the model has been tuned using a range ofwell characterized samples. The fragmentation data are dei-sotoped and charge state-reduced using a maximum likeli-hood algorithm to provide lock mass-corrected, monoisotopicmass measurements for the subsequent database search(20). The calculation of likelihood is based on a probabilisticsummation over all of the possible ways a given peptide couldfragment and give rise to trial masses. Observed masses arecompared with a database containing probabilistic informa-tion about peptide fragmentation patterns based on empiricalobservation. A Markov chain model has been used to de-

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 599

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 12: 00003 Jc Silva 2006 Mcp V5n4p589

scribe a number of attributes that influence the probability ofpeptide identification. These parameters include the expectedappearance of a series of b and y ions, an amino acid under-going a specified neutral loss, and specific cleavages to occuron the C- or N-terminal side as well as the formation ofspecific immonium ions. A more thorough explanation of thisalgorithm has been described by Skilling et al. (20).

Protein Profiling of E. coli among the Different CarbonSources—In previous work, we have shown how accuratemass and retention time measurements of peptides could be

used to identify differentially abundant peptides belonging toa simple set of proteins spiked into a background of humanserum (13). The intent of that study was demonstration of theability to use accurate mass LCMS of intact peptides (precur-sor information) as a primary tool for quantitative and subse-quent qualitative peptide/protein analysis. We now apply thismethodology using both precursor and concurrent fragmention information to monitor the specific metabolic differencesbetween the three E. coli samples comprising our biologicalmodel.

FIG. 5. Peptide identification from an LCMSE analysis. A, an SIC of the doubly charged m/z 945.664 precursor peptide ion from the lowenergy channel (function 1) with an apex retention time of 57.14 min and six associated fragment ions (m/z: 1242.928, 1185.882, 1114.855,900.662, 843.637, and 474.354) from the elevated energy channel (function 2) that all chromatographically apex at 57.17 min. The deltaretention time lies within one scan, 0.03 min. B, the time-resolved fragment ions from the doubly charged m/z 945.664 precursor peptide (lockmass-corrected monoisotopic mass � 1890.0220) identified as the GYINSLGALTGGQALQQAK peptide from isocitrate lyase, ACEA.

E. coli Proteome Analysis

600 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 13: 00003 Jc Silva 2006 Mcp V5n4p589

In this study, we sought to use a model biological system todemonstrate the full qualitative and quantitative capabilitiesprovided by the LCMSE methodology. Fig. 6A shows theabundance ratios for the characterized peptides to a set ofeight proteins found among the three different growth condi-tions. The relative abundance of the identified peptides isseen to lie within a narrow quantitative range. These inde-pendent quantitative measurements of the multiple peptideidentifications for a particular protein provide the ability todetermine the relative abundance of the protein between twoconditions. Using the average -fold change, standard devia-tion, and number of peptides to a particular protein found intwo conditions, the average relative -fold change for a proteinis be displayed with the appropriate 95% confidence interval(CI) in Fig. 6B. The relative quantitation of a particular protein

across multiple binary comparisons (growth condition pro-files) can provide additional information regarding the partic-ipation of proteins in a specific metabolic process. Whentaken together, the correlations among the growth conditionprofiles can be used to group proteins in response to specificperturbations. For example, ACEA/ALDA and IDH/MDH showvery similar expression profiles for each pair of growth con-ditions and to a lesser degree among the four. These fourproteins are in fact all metabolically related because they areinvolved in either the citric acid cycle or the glyoxylate shunt.Also the growth condition profiles of ribosomal proteins RL1/RS1 are very similar and differ from the previous set of pro-teins. The growth condition profiles illustrated by RL1 andRS2 are also shared among the other ribosomal proteinsidentified in this study (Fig. 7). Fig. 7 summarizes the relative

FIG. 6. Differential expression ofpeptides and proteins. A, the resultsfrom the clustered peptides illustratesthat identified peptides corresponding todifferentially expressed proteins haveexpression ratios within a narrow range.The relative expression of the proteincan be determined from these multiplepeptide ratios, providing a measure ofconfidence for each identified protein. B,the multiple peptide measurements toeach protein provide a means to obtain a95% confidence interval for the relativeexpression measurement for each binarycomparison. The pattern obtained fromthe relative abundance of each proteinfrom each condition provides a mecha-nism to group related proteins accordingto their response to the applied pertur-bation. Ribosomal proteins RL1 and RS1show a pattern across the various con-ditions as do a majority of the other iden-tified ribosomal proteins in this study.Other proteins such as ACEA, ALDA,and MDH share a similar pattern that canbe explained by their role in carbon uti-lization. GALE has a unique pattern thatcan be attributed to its role in lactosecatabolism. Ace, acetate; Lac, lactose;Glu, glucose.

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 601

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 14: 00003 Jc Silva 2006 Mcp V5n4p589

quantitation to a number of identified proteins in this studythat are critical for protein translation, carbon utilization, andenergy metabolism. A more detailed discussion of the quan-titative results and their correlation to the understood bio-chemistry is described in the following sections. An expandedlist of the proteins identified in this study is provided in thesupplemental data (Supplemental Table 3).

Translation Machinery—Approximately 8% of the total cellu-lar volume of E. coli is occupied by ribosomal proteins, andthese should be among the abundant proteins in unfractionatedE. coli. Determination of the relative expression of these majorhousekeeping proteins was an important step in the validationof this method before characterizing other, perhaps less abun-

dant proteins involved in carbon metabolism. A total of 49 of the54 ribosomal proteins were identified in the three growth con-ditions, and their relative expression profiles (as ln(ratio)) areillustrated in Fig. 7A. The validated mass spectrum of a subsetof ribosomal proteins is provided in the supplemental data(Supplemental Fig. 4). The average percent sequence coverageof the ribosomal proteins was �54% from either the glucose orlactose growth conditions. However, the average percent se-quence coverage of the ribosomal proteins decreased to �32%during growth on acetate. The concomitant decrease of riboso-mal proteins along with the slower growth rate has been dem-onstrated when E. coli is grown on acetate (23). The resultsillustrated in Fig. 7 concur with these observations.

FIG. 7. Relative quantitation of pro-teins among the three growth condi-tions from unfractionated E. coli. Rel-ative quantitation of proteins associatedwith translation (A), amino acid metabo-lism and stress (B), and carbon and en-ergy metabolism (C) among the threegrowth conditions. The relative quantita-tion is based on the average -foldchange found among the redundant,quantitative peptide measurements fromeach protein. A 95% confidence intervalwas determined for those proteins thatcontained more than one peptide identi-fication. Those proteins that were uniqueto a particular condition were assigned amaximum or minimum value of 5.5(unique to numerator condition) or �5.5(unique to denominator condition) foreach binary comparison. The three bi-nary comparisons are color-coded asfollows: black, lactose versus glucose;red, acetate versus glucose; and blue,acetate versus lactose.

E. coli Proteome Analysis

602 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 15: 00003 Jc Silva 2006 Mcp V5n4p589

The consistency of the expression profiles for the ribosomalproteins in the paired conditions is striking (Figs. 6 and 7).Growth on acetate results in consistent down-regulation ofthese proteins relative to growth on either glucose or lactose.These results are consistent with the work of Marr (23), whowas able to correlate the specific growth rates of E. coli onvarious carbon sources with the absolute quantity of riboso-mal proteins. The growth rate of E. coli in acetate decreaseswhen compared with the growth rate on either glucose orlactose. The lower growth rate correlates to a lower rate ofprotein synthesis and results in a decrease in the level ofribosomal proteins. With a decrease in protein synthesis andribosomal proteins, the demand for amino acid biosynthesis isalso attenuated. Conversely the level of ribosomal proteins isnot affected by substituting lactose for glucose. BecauseTUFA was used to normalize the data across all experiments,it does not show any change throughout the three differentgrowth conditions. A number of other associated ribosomalproteins and protein translation chaperones were also identi-fied, and their relative quantitation was determined. Amongthese proteins were EFG, DNAK, GROEL, GROES, CLPA, andCLPB. Although EFG was not affected, the other proteinswere all up-regulated in both glucose and lactose. Growth oneither glucose or lactose supports higher growth rates, and asa result there is a concomitant increase in protein production,providing the need for these chaperones to facilitate proteinfolding and post-translational modification.

Lactose Utilization—As the PAGE data (Fig. 1) and the inten-sity ratio plots of matched peptides (Fig. 3, C and D) suggest,there are relatively few differentially expressed proteins betweenthe glucose and lactose growth conditions. Those that are dif-ferentially expressed in the lactose condition (summarized inTable III) are significant to the metabolism of lactose. �-Galac-

tosidase (LACZ), which is detected only in the lactose conditionwith 30% sequence coverage, catalyzes that hydrolysis of lac-tose to �-D-glucose and �-D-galactose. Aldolase-1-epimerase(GALM, 41% sequence coverage), which converts the � epimerof galactose to the � epimer, was up-regulated 2.7-fold (ln(Lac-tose/Acetate) � 0.99 � 0.14, 95% CI) in the lactose relative tothe acetate growth condition. Similarly galactokinase (GALK,21% sequence coverage) was also identified and found to beup-regulated by 6.8-fold (ln(Lactose/Acetate) � 1.91 � 0.35,95% CI) in the lactose relative to the acetate growth condition.Another essential protein required for lactose/galactose utiliza-tion, GALT, was also found to be unique to the lactose growthcondition (�16% sequence coverage). It catalyzes the reactionof UDP-D-glucose and �-D-galactose-1-phosphate to �-D-glu-cose-1-phosphate and UDP-galactose. This reaction is coupledwith another enzyme, GALE, that concomitantly converts �-D-galactose-1-phosphate and UTP to UDP-galactose and pyro-phosphate. GALE was identified (�33% sequence coverage)and found to be up-regulated by 7.4-fold (ln(Lactose/Acetate) �

1.99 � 0.24, 95% CI) in the lactose relative to the acetatecondition. These are among some of the characteristic proteinsessential for lactose metabolism. A similar study by Vollmer etal. (24) identified a subset of lactose-specific proteins using atwo-dimensional LC strategy by combining strong cation ex-change and reverse phase chromatography. From the multidi-mensional analysis performed in their study, Vollmer et al. (24)identified LACZ and GALM from their analysis with 15 and 3%total sequence coverage, respectively. Although they were ableto identify a subset of lactose-metabolizing proteins, the resultsfrom their data did not provide any information regarding therelative quantitation of the characterized proteins between thelactose and glucose growth conditions.

Acetate Utilization—Comparison of acetate condition to

FIG. 7—continued

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 603

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 16: 00003 Jc Silva 2006 Mcp V5n4p589

either the glucose or lactose conditions reveals more diversitythan the comparison of glucose to lactose (Figs. 1A and 3,E–H). These results should not be surprising because growthon acetate, instead of glucose or lactose, requires the cell toredirect the carbon flux through different metabolic pathwaysto sustain growth. A major adaptation is the induction ofenzymes to convert acetate, rather than pyruvate, to acetyl-CoA (Fig. 4). Two such pathways exist in E. coli. The first is anefficient pathway that directly converts acetate to acetyl-CoA

in a single step using the acetyl-CoA synthetase (ACS). Theother is a more circuitous route that converts acetate toacetyl-CoA in two steps. The first step, which is catalyzed byacetate kinase (ACKA), involves activation of acetate by phos-phorylation to form acetylphosphate. The second step is cat-alyzed by phosphate acetyltransferase (PTA) to transfer CoAto the activated form of acetate and liberate inorganic phos-phate. The relative quantitation results from this study indicatethat both pathways are up-regulated, although the pathway

TABLE IIISubset of quantified proteins involved in carbon source utilization

AG � ln (average intensity ratio) between acetate and glucose; AL � ln (average intensity ratio) between acetate and lactose; LG � ln(average intensity ratio) between lactose and glucose; 95CI, corresponding � 95% confidence interval; Ace, acetate; Lac, lactose; Glc,glucose; —, not applicable or not determined.

Protein Alternative Blattner Description AG 95 CI AL 95 CI LG 95 CI

Lactose degradationLACZ LACZ b0344 �-Galactosidase — — Lac — Lac —GALM GALM b0756 Aldolase-1-epimerase 1.32 0.21 �0.99 0.14 2.13 0.21GALK GALK b0757 Galactokinase Ace — �1.91 0.35 Lac —GALT GALT b0758 UDP-glucose-hexose-1-phosphate

uridylyltransferase— — Lac — Lac —

GALE GALD b0759 UDP-glucose-4-epimerase 1.06 0.29 �1.99 0.24 3.00 0.29AGP AGP b1002 Glucose-1-phosphatase �0.24 0.25 Ace — Glc —GLK GLK b2388 Glucokinase 0.65 0.33 0.58 0.10 0.06 0.31

Acetate degradationACS ACSA b4069 Acetyl-CoA synthetase 2.81 0.25 Ace — Glc —ACKA ACKA b2296 Acetate kinase 0.63 0.22 0.58 0.23 0.05 0.09PTA PTA b2297 Phosphate acetyltransferase 0.47 0.10 0.44 0.13 0.04 0.10

GlycolysisPGI PGI b4025 Phosphoglucose isomerase �0.65 0.13 �0.78 0.10 0.12 0.15PFKA PFKA b3916 6-Phosphofructokinase-1 0.53 0.10 0.54 0.12 �0.01 0.16PFKB PFKB b1723 6-Phosphofructokinase-2 �0.07 0.20 �0.20 0.21 0.14 0.10FBAB ALF b2925 Fructose-bisphosphate aldolase class II 0.38 0.13 0.25 0.19 0.13 0.13TPIA TPI b3919 Triose-phosphate isomerase 0.42 0.16 0.40 0.12 0.02 0.13GAPA GAPA b1779 Glyceraldehyde-3-phosphate dehydrogenase-A

complex�0.58 0.09 �0.77 0.14 0.20 0.08

PGK PGK b2926 Phosphoglycerate kinase 0.53 0.11 0.38 0.07 0.15 0.09PGMA GPMA b0755 Phosphoglycerate mutase I 0.56 0.12 0.62 0.18 �0.07 0.10ENO ENO b2779 Enolase �0.42 0.09 �0.54 0.05 0.08 0.07ACEE ODP1 b0114 Pyruvate dehydrogenase multienzyme complex,

E1p0.23 0.08 0.15 0.07 0.08 0.06

ACEF ODP2 b0115 Pyruvate dehydrogenase multienzyme complex,lipoate/dihydrolipoamide acetyltransferase

0.18 0.05 0.17 0.05 0.01 0.06

LPDA DLDH b0116 Pyruvate dehydrogenase multienzyme complex,E3 monomer

1.85 0.10 1.98 0.14 �0.12 0.11

Citric acid cycleGLTA ICDB b0720 Citrate synthase 2.32 0.26 2.43 0.25 �0.09 0.10ACNA ACON1 b1276 Aconitase 0.27 0.09 0.27 0.10 0.00 0.05ACNB ACON2 b0118 Aconitase B 1.48 0.23 1.58 0.19 �0.10 0.11ICDA IDH b1136 Isocitrate dehydrogenase 1.08 0.14 1.21 0.12 �0.13 0.07SUCA ODO1 b0726 2-Oxoglutarate dehydrogenase complex, E1p 1.99 0.19 2.16 0.27 �0.17 0.09SUCB ODO2 b0727 2-Oxoglutarate dehydrogenase complex,

dihydrolipoamide succinyltransferase2.58 0.54 3.11 0.29 �0.54 0.27

SUCC SUCC b0728 Succinyl-CoA synthase, � 1.88 0.14 2.03 0.13 �0.16 0.07SUCD SUCD b0729 Succinyl-CoA synthase, � 1.82 0.14 1.96 0.14 �0.14 0.06FUMC FUMC b1611 Fumarate hydratase, class II �0.52 0.14 �0.60 0.17 0.05 0.09MDH MDH b3236 Malate dehydrogenase 1.98 0.11 2.15 0.11 �0.17 0.04

Glyoxylate bypassACEA ICL b4015 Isocitrate lyase 4.48 0.27 4.29 0.21 0.16 0.15ACEB MASY b4014 Malate synthase A 3.35 0.41 Ace — Glc —

E. coli Proteome Analysis

604 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 17: 00003 Jc Silva 2006 Mcp V5n4p589

involving ACS is elevated to a much greater extent (Table III).ACS was identified with 38% sequence coverage and foundto be unique to the acetate up-regulated by 16.6-fold (ln(Ac-etate/Glucose) � 2.81 � 0.25, 95% CI) in the acetate growthcondition. In addition, ACKA was identified with 25% se-quence coverage and up-regulated by 1.9-fold (ln(Acetate/Glucose) � 0.63 � 0.22, 95% CI), whereas PTA was identifiedwith 31% sequence coverage and up-regulated by 1.6-fold(ln(Acetate/Glucose) � 0.47 � 0.10, 95% CI) in the acetategrowth condition. These results are consistent with Oh et al.(7), who reported that ACS was the main path for acetateuptake from microarray analysis of E. coli grown on acetateand glucose. They are also consistent with the results ofKakuda et al. (25), who showed that mutation of both ackAand pta inhibited cell growth in acetate, indicating that thispathway also delivers a significant amount of carbon flux intothe cell. In the absence of glucose in the growth medium wealso observed that a number of enzymes in the glycolysispathway, including phosphoglucose isomerase (PGI), glycer-aldehyde-3-phosphate dehydrogenase-A complex (GAPA),and enolase (ENO) are significantly down-regulated in the

acetate sample. Another hallmark of growth on acetate is theinduction of the glyoxylate shunt pathway. Specifically isoci-trate lyase (ACEA) and malate synthase A (ACEB) redirect thecarbon flux through the citric acid cycle to conserve the use ofthe acetyl-CoA for production of primary metabolites andenergy management without loss of carbon as carbon diox-ide. Both ACEA and ACEB were identified in the acetatecondition (58 and 32% sequence coverage, respectively) andwere found to be highly up-regulated, 88.2-fold (ln(Acetate/Glucose) � 4.48 � 0.27, 95% CI) in the case of ACEA and28.5-fold (ln(Acetate/Glucose) � 3.35 � 0.41, 95% CI) in thecase of ACEB.

Glucose Utilization (Glycolysis and Tricarboxylic Acid Cy-cle)—Several glycolysis proteins (PGI, GAPA, and ENO) weredown-regulated in acetate relative to either glucose or lactose(Fig. 7 and Tables III and IV). Although the media conditionswere not identical among the three studies, the directionalityof the relative abundance data is consistent with the microar-ray data reported by Oh et al. (7) and the 2DGE data by Pengand Shimizu (18) and serve to help validate our methodology.Peng and Shimizu (18) point out that these same proteins are

TABLE IVComparison with 2DGE and transcriptional microarrays

AG � ln (average intensity ratio) between acetate and glucose. Column A shows protein levels as determined by the label-free quantitationmethod described in this study. Column B shows transcript levels as reported by Oh et al. (7). Column C shows proteins levels as determinedfrom 2DGE reported by Peng and Shimizu (18). NR, not determined in the study.

Protein Alternative Blattner DescriptionAG

A B C

Acetate degradationACS ACSA b4069 Acetyl-CoA synthetase 2.81 2.25 NRACKA ACKA b2296 Acetate kinase 0.63 �0.67 �0.51PTA PTA b2297 Phosphate acetyltransferase 0.47 �0.43 �0.22

GlycolysisPGI PGI b4025 Phosphoglucose isomerase �0.65 �0.12 �0.51PFKA PFKA b3916 6-Phosphofructokinase-1 0.53 �0.53 �0.69PFKB PFKB b1723 6-Phosphofructokinase-2 �0.07 0.18 0.00FBAB ALF b2925 Fructose-bisphosphate aldolase class II 0.38 �0.69 �0.51TPIA TPI b3919 Triose-phosphate isomerase 0.42 �0.05 �0.11GAPA GAPA b1779 Glyceraldehyde-3-phosphate dehydrogenase-A complex �0.58 �0.80 �0.69PGK PGK b2926 Phosphoglycerate kinase 0.53 �0.53 �0.69PGMA GPMA b0755 Phosphoglycerate mutase I 0.56 0.53 �0.51ENO ENO b2779 Enolase �0.42 �0.62 NRACEE ODP1 b0114 Pyruvate dehydrogenase multienzyme complex, E1p 0.23 �0.12 �0.51ACEF ODP2 b0115 Pyruvate dehydrogenase multienzyme complex, lipoate/

dihydrolipoamide acetyltransferase0.18 �0.82 �0.51

Citric acid cycleGLTA ICDB b0720 Citrate synthase 2.32 1.59 1.50ACNA ACON1 b1276 Aconitase 0.27 0.41 0.74ACNB ACON2 b0118 Aconitase B 1.48 1.93 1.19ICDA IDH b1136 Isocitrate dehydrogenase 1.08 0.59 0.41SUCA ODO1 b0726 2-Oxoglutarate dehydrogenase complex, E1p 1.99 0.47 1.48SUCB ODO2 b0727 2-Oxoglutarate dehydrogenase complex, dihydrolipoamide

succinyltransferase2.58 0.79 1.34

SUCC SUCC b0728 Succinyl-CoA synthase, � 1.88 1.03 1.48SUCD SUCD b0729 Succinyl-CoA synthase, � 1.82 1.13 1.34FUMC FUMC b1611 Fumarate hydratase, class II �0.52 0.74 0.74MDH MDH b3236 Malate dehydrogenase 1.98 1.36 1.22

Glyoxylate bypassACEA ICL b4015 Isocitrate lyase 4.48 3.66 2.40ACEB MASY b4014 Malate synthase A 3.35 2.83 2.33

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 605

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 18: 00003 Jc Silva 2006 Mcp V5n4p589

up-regulated in E. coli during growth on glucose when acetateis metabolized, and the flux in the gluconeogenic direction issmaller than the glycolytic flux. In addition, a series of tricar-boxylic acid cycle proteins (GLTA, ACNB, ICDA, SUCA,SUCB, SUCC, SUCD, and MDH) were found to be up-regu-lated in acetate relative to either glucose or lactose. However,there are a number of differences among the three studies,most of which lie within the glycolytic pathway.

Some of the exceptions in the microarray work of Oh et al.(7) can be explained by the common lack of correlation be-tween mRNA levels and protein abundance that has beenreported by others in many unrelated studies (8–10). Addi-tionally there were a variety of differences among growthconditions in the three studies, ranging from the use of richand minimal media to the amount of carbohydrate in thegrowth medium. One would expect some difference in glycol-ysis when comparing minimal to rich media because primarymetabolites (amino acids and cofactors) are present in the richmedium; however, energy metabolism should show somesimilarity because there is a demand to support cell growthrate. This is evidenced by the greater degree of correlationfound in the relative abundance among the proteins, and theircorresponding transcripts, in the citric acid cycle and glyoxy-late bypass reported from the three studies.

Another source of variation in the work of Peng andShimizu (18) could be attributed to some of the commoncharacteristics related to the 2DGE technique. The quanti-tative results obtained from 2DGE can be affected by pro-teins that exist in many spots on the gel or in instanceswhere single spots on the gel contain many proteins. Therelative quantity of isoforms of a given protein can changewithin a given biological sample; thus relative quantitationcould be inaccurate if only one gel spot was used for therelative quantitation of proteins known to exist in differentisoforms. If the reason for the multiple gel spots is a post-translational modification of a few residues within the largerprotein, this would affect the quantitative result of only themodified peptide(s), leaving the remaining tryptic peptidesto correlate well with the relative abundance of the full-length protein. If there is post-translation modification byproteolytic cleavage of the protein, then the quantitativeresults could lead to the identification of two sets of trypticpeptides exhibiting two different relative abundances pro-vided that the inactive, truncated form of the protein wassubsequently degraded. We have not explored examples ofthese specific scenarios, but because the data acquisitionprovides a digitized inventory of all the precursors andpeptides, we intend to search for examples of these modi-fications in the data and perhaps design other experimentsto explore this phenomenon as a topic for future work.

Because both lactose and glucose conditions were ana-lyzed and compared with acetate, they provided comple-mentary data in the same experiment and serve as positivecontrols, which offer additional confidence in the results of

our study. Although there are discrepancies associated withthe relative quantitation of a number of glycolytic proteinsamong the different studies, they serve to define futureexperiments that will aim to address the causes of theseanomalies.

Conclusions—The method described in this study is a pow-erful tool to simultaneously gather qualitative and quantitativeinformation for the characterization of components in a com-plex protein mixture. The data illustrate that the alternatescanning LCMS method (LCMSE) of the Protein Expressionsystem is a label-free method that is ideally suited to thesestudies. The inherent redundancy of the tryptic peptides gen-erated from the endogenous proteins is utilized both for pro-tein identification and for subsequent relative quantitation.This strategy provides accurate mass measurements, typi-cally below 5 ppm, of tryptic peptides and their correspondingfragment ions throughout the subsequent LCMS analysis.While determining the mass measurements of both trypticpeptides and fragments, the processing software simulta-neously preserves the chromatographic integrity of the data toenhance its qualitative and quantitative capabilities. A morethorough discussion of the qualitative capabilities of thismethod will be described in future work.2 The ability to collect“all the ions all the time” provides structural information forevery tryptic peptide that generates fragments above theminimum detection threshold of the mass spectrometer.

A major goal of this study was to demonstrate the utility ofthe Protein Expression system with a model biological systemsuch as E. coli. The study, in fact, illustrated that the informa-tion obtained from the LCMSE data of the E. coli tryptic di-gests correlated well with the known biology of carbon me-tabolism. This work demonstrates that the Protein Expressionsystem can rapidly determine proteome differences amongvaried biological conditions. These protein profiling experi-ments yield important information about the response ofE. coli to environmental perturbations. Similar studies couldlater be generalized to investigate other biological systems.Such future studies can in turn lead to a more targeted strat-egy to combat and/or detect virulent microbes, help developnovel antibiotics, and identify important biomarkers for clinicaldiscovery and diagnostics. In fact, this method has alreadybeen extended to other biological systems, such as myco-bacteria (Mycobacterium bovis), to study proteomic profilesunder different drug treatments in an effort to determine themechanism of action of novel drugs (21). The simplicity of thislabel-free approach should encourage more experimentationand increase the efficiency of future biologic research.

Acknowledgments—We are thankful for the valuable contributionsof Timothy Riley throughout the development of this work. We alsoacknowledge Jeanne Li and others at Waters Corp. who providedinsight throughout the editing of this manuscript. We also thank BlueSky Biotech (Worcester, MA) for the contribution in the productionand preparation of the E. coli protein extracts that were used for thepurpose of this work.

E. coli Proteome Analysis

606 Molecular & Cellular Proteomics 5.4

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from

Page 19: 00003 Jc Silva 2006 Mcp V5n4p589

* The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be herebymarked “advertisement” in accordance with 18 U.S.C. Section 1734solely to indicate this fact.

□S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.

§ To whom correspondence should be addressed: Waters Corp.,34 Maple St., Milford, MA 01757-3696. Tel.: 508-482-3005; Fax:508-482-2055; E-mail: [email protected].

REFERENCES

1. Blattner, F. R., Plunkett, G., Bloch, C. A., Perna, N. T., Burland, V., Tiley, M.,Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., Gregor, J.,Davis, N. W., Kirkpatrick, H. A., Goeden, M. A., Rose, D. J., Mau, B., andShao, Y. (1997) The complete genome sequence of Escherichia coliK-12. Science 277, 1453–1474

2. Selinger, D. W., Cheung, K. J., Mei, R., Johansson, E. M., Richmond, C. S.,Blattner, F. R., Lockhart, D. J., and Church, G. M. (2000) RNA expressionanalysis using a 30 base pair resolution Escherichia coli genome array.Nat. Biotechnol. 18, 1262–1268

3. O’Farrell, P. H. (1975) High resolution two-dimensional electrophoresis ofproteins. J. Biol. Chem. 250, 4007–4021

4. Sundararaj, S., Guo, A., Habibi-Nazhad, B., Rouani, M., Stothard, P., Elli-son, M., and Wishart, D. S. (2004) The CyberCell Database (CCDB): acomprehensive, self-updating, relational database to coordinate andfacilitate in silico modeling of Escherichia coli. Nucleic Acids Res. 32,D293–D295

5. Misra, R. V., Horler, R. S. P., Reindl, W., Goryanin, I. I., and Thomas, G. H.(2005) EchoBASE: an integrated post-genomic database for Escherichiacoli. Nucleic Acids Res. 33, D329–D333

6. Zimmer, D. P., Soupene, E., Lee, H. L., Wendisch, V. F., Khodursky, A. B.,Peter, B. J., Bender, R. A., and Kustu, S. (2000) Nitrogen regulatoryprotein C-controlled genes of Escherichia coli: scavenging as a defenseagainst nitrogen limitation. Proc. Natl. Acad. Sci. U. S. A. 97,14674–17679

7. Oh, M. K., Rohlin, L., Kao, K. C., and Liao, J. C. (2002) Global expressionprofiling of acetate-grown Escherichia coli. J. Biol. Chem. 277,13175–13183

8. Griffin, T. J., Gygi, S. P., Ideker, T., Rist, B., Eng, J., Hood, L., andAebersold, R. (2002) Complementary profiling of gene expression at thetranscriptome and proteome levels in Saccharomyces cerevisiae. Mol.Cell. Proteomics 1, 323–333

9. Gygi, S. P., Rochon, Y., Franza, B. R., and Aebersold, R. (1999) Correlationbetween protein and mRNA abundance in yeast. Mol. Cell. Biol. 19,1720–1730

10. Anderson, L., and Seilhamer, J. (1997) Comparison of selected mRNA andprotein abundances in human liver. Electrophoresis 18, 533–537

11. Henzel, W. J., Watanabe, C., and Stults, J. T. (2003) Protein identification:the origins of peptide mass fingerprinting. J. Am. Soc. Mass Spectrom.

14, 931–94212. Bateman, R. H., and Hoyes, J. B. (January 16, 2002) Methods and appa-

ratus for mass spectrometry. UK Patent 2,364,168A13. Silva, J. C., Denny, R., Dorschel, C. A., Gorenstein, M., Kass, I. J., Li, G.-Z.,

McKenna, T., Nold, M. J., Richardson, K., Young, P., and Geromanos, S.(2004) Quantitative proteomic analysis by accurate mass retention timepairs. Anal. Chem. 77, 2187–2200

14. Purvine, S., Eppel, J. T., Yi, E. C., and Goodlett, D. R. (2003) Shotguncollision-induced dissociation of peptides using a time of flight massanalyzer. Proteomics 3, 847–850

15. Kuhn, E., Wu, J., Karl, J., Liao, H., Zolg, W., and Guild, B. (2004) Quantifi-cation of C-reactive protein in the serum of patients with rheumatoidarthritis using multiple reaction monitoring mass spectrometry and 13C-labeled peptide standards. Proteomics 4, 1175–1186

16. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K.,Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S.,Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., andPappin, D. J. (2004) Multiplexed protein quantitation in Saccharomycescerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell.Proteomics 3, 1154–1169

17. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold,R. (1999) Quantitative analysis of complex protein mixtures using iso-tope-coded affinity tags. Nat. Biotechnol. 17, 994–999

18. Peng, L., and Shimizu, K. (2003) Global metabolic regulation analysis forEscherichia coli K12 based on protein expression by 2-dimensionalelectrophoresis and enzyme activity. Appl. Microbiol. Biotechnol. 61,163–178

19. Yu, Y. Q., Gilar, M., Lee, P. J., Bouvier, E. S. P., and Gebler, J. C. (2003)Enzyme-friendly, mass spectrometry-compatible surfactant for in-solu-tion enzymatic digestion of proteins. Anal. Chem. 75, 6023–6028

20. Skilling, J., Denny, R., Richardson, K., Young, P., McKenna, T., Cam-puzano, I., and Ritchie, M. (2004) Probseq—a fragmentation model forinterpretation of electrospray tandem mass spectrometry data. Comp.Funct. Genomics 5, 61–68

21. Hughes, M. A., Silva, J. C., Geromanos, S. J., and Townsend, C. A. (2006)Quantitative proteomic analysis of drug-induced changes in Mycobac-teria. J. Proteome. Res. 5, 54–63

22. Conrads, T. P., Anderson, G. A., Veenstra, T. D., Pasa-Tolick, L., and Smith,R. D. (2000) Utility of accurate mass tags for proteome-wide proteinidentification. Anal. Chem. 72, 3349–3354

23. Marr, A. G. (1991) Growth rate of Escherichia coli. Microbiol. Rev. 55,316–333

24. Vollmer, M., Nagele, E., and Horth, P. (2003) Differential proteome analysis:two-dimensional nano-LC/MS of E. coli proteome grown on differentcarbon sources. J. Biomol. Tech. 14, 128–135

25. Kakuda, H., Hosono, K., Shiroishi, K., and Ichihara, S. (1994) Identificationand characterization of the ackA (acetate kinase A)-pta (phosphotrans-acetylase) operon and complementation analysis of acetate utilization byan ackA-pta deletion mutant of Escherichia coli. J. Biochem. 116,916–922

E. coli Proteome Analysis

Molecular & Cellular Proteomics 5.4 607

at CE

LL SIG

NA

LING

TE

CH

NO

LOG

Y on M

ay 22, 2007 w

ww

.mcponline.org

Dow

nloaded from