using tree analysis pattern and seldi-tof-ms to discriminate transitional cell carcinoma of the...
TRANSCRIPT
EuropeanUrology European Urology 47 (2005) 456–462
UsingTreeAnalysis Pattern and SELDI-TOF-MS toDiscriminateTransitional Cell Carcinoma of the BladderCancer fromNoncancer PatientsWeiwei Liua, Ming Guana, Denglong Wuc, Yuanfang Zhangb, Zhong Wub,Ming Xua, Yuan Lua,*aCenter of Laboratory Medicine, Huashan Hospital, Fudan University, Shanghai 200040, P.R. ChinabDepartment of Urology, Huashan Hospital, Fudan University, Shanghai 200040, P.R. ChinacDepartment of Urology, No. 6 People’s Hospital, Jiaotong University, Shanghai 200042, P.R. China
Accepted 7 October 2004
Available online 28 October 2004
AbstractObjective: To determine whether SELDI protein profiling of urine coupled with a tree analysis pattern coulddifferentiate TCC from noncancer patients.Methods: The ProteinChip Arrays were performed on a ProteinChip PBS II reader of the ProteinChip BiomarkerSystem. The study was divided into two phases: a preliminary phase with construction of tree analysis pattern, and atesting phase with test urine samples. Generation of the tree analysis pattern was performed by a training data setconsisting of 104 samples. The validity of the tree analysis pattern was then challenged with a test set of 68 samples.Results: Average of 187 mass peaks was detected in the urine samples, and five of these peaks were used to constructthe tree analysis pattern. The classification pattern correctly predicted 91.67–94.64% of the samples for both of thetwo groups in the training set, for an overall correct classification of about 93%. The pattern correctly predicted72.0% (49 of 68) of the test samples, with 71.4% (25 of 35) of the TCC samples, 72.7% (24 of 33) of the noncancersamples.Conclusions: The high sensitivity and specificity obtained by the urine protein profiling approach demonstrate thatSELDI-TOF-MS combined with a tree analysis pattern can both facilitate discriminate TCC bladder cancer withnoncancer and provide an innovative clinical diagnostic platform improve the detection of TCC bladder cancerpatients.# 2004 Elsevier B.V. All rights reserved.
Keywords: TCC; SELDI; Bladder cancer
1. Introduction
Bladder cancer is the fourth most common cancerin men and the eighth most common in women [1].More than 90% of the cases are the transitional cellcarcinoma (TCC) histology [2]. The most reliableways of diagnosis and surveillance of TCC are bycytoscopic examination and bladder biopsy for histo-logical confirmation [3]. The invasive and labor-
* Corresponding author. Tel: +86 21 62498118; Fax: +86 21 62498118.
E-mail address: [email protected] (Y. Lu).
0302-2838/$ – see front matter # 2004 Elsevier B.V. All rights reserved
doi:10.1016/j.eururo.2004.10.006
intensive nature of cytoscopic examination presentsa challenge to develop better and noninvasive diag-nostic tools [2]. Urine cytology has been the goldstandard of the noninvasive diagnostic approaches. Ithas high specificity and provides the advantage overbiopsy of screening the entire urothelium. However,its high false-negative rate has limited its use as anadjunct to cystoscopy [4,5]. Application of new tech-nologies for detection of bladder cancer could havean important effect on public health. To achieve thisgoal, specific and sensitive molecular markers areessential.
.
W. Liu et al. / European Urology 47 (2005) 456–462 457
The classical approach for discovering disease-asso-ciated proteins is two-dimensional polyacrylamide gelelectrophoresis (2D-PAGE) [6]. 2D-PAGE is cumber-some, labor intensive, suffers reproducibility problemsand is not readily transformed into a clinical assay.
Significant technological advances in protein chem-istry have established matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF) as a tool for protein study [7–9]. A limitation ofMALDI-TOF-MS is the presence of buffer compo-nents, lipids, carbohydrates, etc. that prevent efficientionization of the proteins within [10].
The developments of surface-enhanced laser deso-rption/ionization time-of-flight mass spectrometry(SELDI-TOF-MS), have largely overcome many ofthese limitations [11]. By utilizing arrays with differingsurface chemistries, a complex mixture of proteins, asfrom cells or body fluids, can be resolved into subsetsof proteins with common properties. After the arraysare washed to remove weakly bound proteins, a solu-tion containing an energy-absorbing molecule is addedand allowed to crystallize, embedding retained proteinswithin. After crystallization, features within thesearrays are read in a ProteinChip Reader. Once a peakof interest has been detected, the analyte can beenriched or purified for further analysis [12].
Taking advantage of the recent development inSELDI, we were able to simultaneously analyze theprotein profiles of urine samples from patients with orwithout bladder cancer. Comparison of the spectralpatterns was performed by manual visual inspection. Alaborious undertaking fraught with significant error,and clearly suggested that bioinformatic classifieralgorithms would be required to efficiently and effec-tively deal with the high dimensionality of the SELDIdata. In this study, Biomarker Patterns Software(Ciphergen Biosystem, Inc.) is a powerful tool to dealwith the hundreds of the protein that can be detected inthe urine samples.
The objective of this study was to determine whetherSELDI protein profiling of urine coupled with a treeanalysis pattern could effectively differentiate TCCfrom noncancer patients.
Table1Grade/stage of bladder carcinoma
Grade No. of samples Stage No. of samples
I 27 Ta 19
II 35 T1 48
III 21 T2 11
T3 2
CIS 3
2. Materials andmethods
2.1. Study groups and samples
Urine samples were collected from patients seen in the Depart-
ment of Urology, Huashan Hospital and No. 6 People’s Hospital.
Diagnoses were pathologically confirmed, and specimens were
obtained before treatment. The samples were early morning urine
sample and voided. They were immediately briefly centrifuged
(1 minute, 10,000 rpm) for the removal of cellular material. The
urine was distributed into 200 ml aliquots and stored at �80 8C.
None of the samples had been thawed more than twice.
172 specimens were included in this study. In TCC cancer
group, the median age was 58 years (range, 31–86 years). The
noncancer control group included urine samples from 89 patients
with normal individuals (Normal) and other urogenital diseases
(Other). The median age of the healthy people was 50 years (range,
23–86 years). The median age of the other group was 53 years
(range, 21–76 years). Healthy controls (n = 53) included volunteers
with no evidence of disease, and healthy individuals. Other uro-
genital disease (n = 36) included clinical or pathologically con-
firmed prostatitis (n = 4), urinary tract infections (n = 6), benign
prostatic hyperplasia (n = 7), inflammation bladder (n = 6), pros-
tate cancer (n = 3), carcinoma of kidney (n = 2), calculus of ureter
(n = 4), renal transplantation (n = 1), hydronephrosis (n = 3). In the
training set the total number of the noncancer is 56 that including
20 samples with other urogenital diseases and 36 samples with
normal control.
The cancer group consisted of 83 urine samples from TCC
bladder cancer patients at different clinical grades and stages (Table
1). In the training set, the total number of the TCC is 48 and there
are 35 TCC samples in the test set. In the test set the total number of
the noncancer is 33 that including 16 samples with other urogenital
diseases and 17 samples with normal control.
2.2. ProteinChip SELDI analysis of urine
The Weak Cationic Exchange (WCX2) ProteinChip Arrays
(Ciphergen) were performed in our study.
Protein concentration of the urine samples was estimated using
BCA method. Samples were diluted with binding buffer (0.5 mol/l
NaCl buffed with 100 mmol/l sodium acetate, pH 4) to equal
protein concentration (2 mg/ml) before using.
The chips were put in a bioprocessor (Ciphergen Biosystems,
Inc.), which is a device that allows application of larger volumes of
urine to each chip array. The WCX2 arrays were equilibrated twice
with 100 ml of binding buffer. 100 ml of the diluted urine mixture
were applied to each well, and the bioprocessor was sealed and
shaken on a platform shaker at a speed of 300 rpm for 1 hour. After
the incubation, the bioprocessor was washed with 100 ml of binding
buffer in each well. This step was repeated twice, and each time the
binding buffer was discarded by inverting the bioprocessor on a
paper towel. The chips were removed from the bioprocessor and
washed with DI water. After the arrays were air-dried, 0.5 ml of
saturated matrix solution (a-cyano-4-hydroxycinnamic acid in
0.5% acetonitrile and 0.5% trifluoroacetic acid) was applied twice
on the array and allowed to air dry.
2.3. Reading chip
The ProteinChip Arrays were read on a ProteinChip PBS II reader
of the ProteinChip Biomarker System. The chips were analyzed
manually under the following settings: laser intensity 200, detector
sensitivity 9, and molecular mass range 0–50 kDa, mass focus 2–
W. Liu et al. / European Urology 47 (2005) 456–462458
Fig. 1. Flow diagram showing the processes of proteomic analysis.
10 kDa. Data were collected without filters and were later used for
analyses. The operators were unaware of which was which.
Mass accuracy is assessed through the use of the All-in-1
peptide molecular mass standard (Ciphergen Biosystems, Inc.).
2.4. Data analysis
The urine samples were divided into two sets: the training set
(104 samples) and the test set (68 samples). Analysis was also
divided into two phases: a preliminary phase (phase I) with
construction of tree analysis pattern, and a testing phase (phase
II) with test urine samples. The validity of the tree analysis pattern
was then challenged with the test set. The test set was analyzed in
phase II (Fig. 1).
2.5. Protein peak selection
All spectra were compiled, and qualified mass peaks (signal-to-
noise ratio > 5) with mass-to-charge ratios (m/z) between 2 and
50 kDa were autodetected. Peak clusters were complete using
second-pass peak selection (signal-to-noise ratio >2), and esti-
mated peaks were added.
The mass range from 2–50 kDa was selected for analysis
because this range contained the majority of the resolved pro-
tein/peptides. The molecular masses from 0–2000 Da were elimi-
nated from analysis because this area contains adducts and artifacts
of the EAM and possibly other chemical contaminants.
2.6. Construction of tree analysis pattern
Generation of the tree analysis pattern was performed by
Biomarker Patterns Software version 4.0 (Ciphergen Biosystems,
Inc.), using a training data set consisting of 104 samples (48 TCC,
36 normal, 20 other urogenital diseases samples).
A data set was divided into two nodes by tree analysis pattern,
using one rule at a time in the form of a question. Presence or
absence and the intensity levels of one peak define the splitting
decision. This splitting process continues until terminal nodes or
leaves are produced or further splitting has no gain. Classification
of terminal nodes is determined by the group (‘‘class’’) of samples
(i.e., TCC, Normal, or Other) representing the majority of samples
in that node. Peaks selected by this process to form the splitting
rules are the ones that achieve the maximum reduction of cost in the
two descendant nodes [13].
2.7. Statistical analyses
A Bayesian approach was used to calculate the expected
probabilities of each class in each terminal node. Specificity was
calculated as the ratio of the number of noncancer samples cor-
rectly classified to the total number of noncancer samples. Sensi-
tivity was calculated at the ratio of the number of correctly
classified TCC bladder cancer to the total number of cancer
samples.
2.8. Reproducibility analyses
The reproducibility of SELDI spectra from spot to spot on a
single chip (intra-assay) and between chips (inter-assay) was
determined using the urine quality control sample. Five proteins
in the range of 3–20 kDa observed on spectra randomly selected
over the course of the study were used to calculate the mass and
intensity mean CV.
3. Results
A total of 187 qualified mass peaks (signal-to-noiseratio >5) were detected in the training set. SELDI wasparticularly effective in resolving the low molecularweight (<10 kDa) proteins and polypeptides. Peakintensity was normalized to total ion current (2–50 kDa).
These identified 187 peaks in the training set werethen used to construct the decision tree classificationpattern (Fig. 2). Analysis of urine specimens frompatients with TCC, patients with other diseases ofthe urogenital tract, and normal individuals, revealedthat five prominent protein peaks were spectra views ofthese proteins are shown in Fig. 3. No single peak wasidentified alone; indicating that there was not a peakdetected that alone could completely separate twogroups (TCC versus Noncancer). The average SELDImass associated the five proteins are PEAK 1:
W. Liu et al. / European Urology 47 (2005) 456–462 459
Fig. 2. Diagram of decision tree analysis pattern. Classification of the TCC and Noncancer samples in the training set. The root node (top) and descendant nodes
are shown as ovals, and the terminal nodes (Node 1–Node7) are shown as rectangles. The numbers in each node represent the classes [T, number of TCC
samples; Non, number of noncancer samples]. The first number under the root and descendant nodes is the mass value followed by the peak intensity value. For
example, the mass value under the root node is 33221 kDa, and the intensity is 1.241.
5105 Da; PEAK2: 5565 Da; PEAK3: 16048 Da;PEAK4: 28435 Da; PEAK5: 33221 Da (Fig. 3(a)–(e).The classification pattern used five masses to generate 7terminal nodes. Classification trees split up a data setinto two bins or nodes, using one rule at a time in theform of a question. Presence or absence and the inten-sity levels of one peak define the splitting decision. Forexample, the answer to ‘‘Does mass A have an intensityless than or equal to X’’ splits the data set into twonodes, a left node for yes and a right node for no. Thissplitting process continues until terminal nodes orleaves are produced or further splitting has no gain.Classification of terminal nodes is determined by thegroup (‘‘class’’) of samples (i.e., TCC or Noncancer)
representing the majority of samples in that node. Forexample, if the intensity of an unknown sample in mass16048 Da was more than 0.053 and in mass 33221 Dawas more than 1.241, then the sample is placed interminal node 7 and classified as TCC. If the sampleis placed in node 3, it will be assigned to noncancerpatient. Based on the stochastic nature of reality, mis-classification of a new sample cannot be ruled out evenfor a pure node that contains only one sample type, forexample, node 1, which contains only TCC samples.The probability of incorrect assignment of samplesincreases in nodes that contain few majority samplesor when only a few samples are assigned to the node, as,for example, terminal nodes 1, 2 and 6 (Table 2) [15].
W. Liu et al. / European Urology 47 (2005) 456–462460
Fig. 3. (a, b, c, d and e): Detection of five protein peaks in urine that used in the tree analysis pattern. Mass spectra of the urine samples from two different TCC
patients and two noncancer samples. The average molecular mass of the five proteins is shown in the figure.
A summation of the classification results from the 7terminal nodes is presented for the training and test setsin Table 3. The classification pattern correctly pre-dicted 91.67–94.64% of the samples for both of the twogroups in the training set (Table 3(A)), for an overallcorrect classification of about 93%. The pattern cor-rectly predicted 72.0% (49 of 68) of the test samples,with 71.4% (25 of 35) of the TCC samples, 72.7% (24of 33) of the noncancer samples (Table 3(B)). Whencomparing TCC versus noncancer (Other/Normal), thesensitivity was 71.4% (25 of 35), and the specificitywas 72.7% (24 of 33). And the pattern correctly
predicted 79.0–95.2% of the TCC samples in differentgrades and different stages (Table 3(C)).
The reproducibility of the SELDI spectra, i.e., massand intensity intraassay and interassay, was determinedwith the pooled normal urine quality contral sample.Five proteins in the range of 3–20 kDa observed onspectra randomly selected over the course of the studywere used to calculate the mean CV. The intra- andinterassay mean CV for mass were 0.3% and 0.8%respectively, and the intra- and interassay mean CV forthe normalized intensity were 8% and 12%, respec-tively.
W. Liu et al. / European Urology 47 (2005) 456–462 461
Table 2Expected probability of the classes assigned to 7 terminal nodes
Node Class Observation Probability
1 TCC 7 0.8889
NC 0 0.1111
2 TCC 1 0.3333
NC 3 0.6667
3 TCC 1 0.0952
NC 18 0.9048
4 TCC 13 0.8235
NC 2 0.1765
5 TCC 1 0.0625
NC 29 0.9375
6 TCC 1 0.3333
NC 3 0.6667
7 TCC 24 0.9259
NC 1 0.0741
4. Discussion
Because proteins are genes products, it is logical toexpect that specific proteomic profiles may be alsoassociated with of the tumors [14]. Complex urineproteomic patterns might reflect the underlying patho-logical state of TCC bladder cancer. This hypothesis issupported by the results of our study. The high sensi-tivity, specificity obtained in this study demonstratesthat SELDI can both facilitate discriminate TCC blad-der cancer with noncancer.
SELDI-TOF-MS using a protein chip that capturesproteins based on their ability to selectively bind to aweak cation exchange surface that was capable ofresolving several hundred of urine proteins/peptides.This is far less than proteins capable of being separatedby 2D-PAGE, but the advantage over 2D-PAGE is theability of SELDI to effectively resolve polypeptidesand peptides smaller than 20 kDa [15].
This innovative technology has other numerousadvantages: it is much faster, has a high-throughput
Table 3Decision tree classification of the training and test set
TCC N
A. Training set
TCC (n = 48) 44 (91.67%)
Noncancer (n = 56) 3 (5.36%) 5
Total no. of samples (n = 104)
B. Test set
TCC (n = 35) 25 (71.4%) 1
Noncancer (n = 33) 9 (27.3%) 2
Total no. of samples (n = 68)
Grade I, II G
C. Sensitivity in the stage and grade of tumor in total bladder cancer samples
Sensitivity 79.0% 9
capability, requires orders of magnitude lower amountsof the protein sample, has a sensitivity for detectingproteins in the picomole to attamole range, can effec-tively resolve low mass proteins (2000 to 20,000 Da),and is directly applicable for clinical assay develop-ment [3].
Because of the multifactorial nature of cancer, it isvery likely that a combination of several markers willbe necessary to effectively detect and diagnose TCC.And in our study SELDI technology can generateseveral hundred of protein profiling simultaneously.Biomarker Patterns Software would be capable ofanalyzing such high volume of data to develop anefficient and reproducible classifier.
The identity of the peak masses used in the treeanalysis pattern is not necessary for making a diag-nosis. These proteins/peptides could be derived fromthe host organ, the cancer, or constitute metabolicfragments [16]. The only requirement for this classi-fication system to make an accurate diagnosis is thatthe biomarkers be reproducibly detected by SELDI andaccurately selected by the classifier. Obtaining a namefor each of the masses used in the classifier will notmake the classification system better or more accurate.However, because knowing their exact identities willbe essential for understanding what biological roles ofthese peptide/proteins may have in the cancer, effortsare under way to purify, identify, and characterize theseprotein/peptide biomarkers [15]. Furthermore, know-ing their identities will be essential for producingantibodies for development of either classical orSELDI immunoassays [17,18].
In fact, many noninvasive molecular diagnostic testshave been developed and several potential biomarkershave been identified. Reviews of the sensitivity/speci-ficity and reproducibility of these biomarkers indicatethat some have greater sensitivity/specificity than urinecytology, however poor reproducibility precludes them
oncancer Misclassified rate
4 (8.33%) 4 (8.33%)
3 (94.64%) 3 (5.36%)
7 (6.73%)
0 (28.6 %) 10 (28.6 %)
4 (72.7%) 9 (27.7%)
19 (27.9%)
rade III Ta, CIS T1, T2, T3
5.2% 86.3% 81.9%
W. Liu et al. / European Urology 47 (2005) 456–462462
from replacing urine cytology or cytoscopy[4,5,10,19]. In this study, the SELDI can overcomethe limitation and the reproducibility is acceptable.
SELDI-MS-TOF combined with the Strong AnionExchange (SAX2) arrays have been used to detect thepotential biomarkers in urine by Antonia Vlahou et al.[3]. Our results further support the applicability of thistechnology as a method for protein profiling of urinesamples if high diagnostic sensitivity for TCC. In theirstudy, they found five potential novel TCC biomarkersdifferent expressed in TCC and noncancer. One of theTCC biomarkers (3.3/3.4 kD) was identified as defen-sin. The sensitivity and specificity of this marker is47% and 86% respectively. Although they used theSAX2 chips, which might be expected to bind differentproteins than those that would bind to the WCX2 chip
used in the study, it is interesting to note that the massesare distinctly different from those used in our study.The contradictory results reflect the different selectionof chromatographic surface.
The ultimate clinical application is the early detec-tion of cancer, a topic of obvious public health impor-tance. In our study the sensitivity for grade I, II andgrade III is 79.2% and 95.2%, respectively. The resultsuggested that we couldn’t use this method to detectTCC bladder cancer in early grade as effectively as inlate grade.
In conclusion this study suggests that TCC specificproteomic signatures are present in the urine of patientswith TCC and have potential as a clinical tool for thedetection or classification of individual cancers withhigh sensitivity and specificity.
References
[1] Greenlee RT, Hill-Harmon MB, Murray T, Thun M. Cancer statistics,
2001. CA Cancer J Clin 2001;51(1):15–36.
[2] Stein JP, Grossfeld GD, Ginsberg DA, Esrig D, Freeman JA, Figueroa
AJ, et al. Prognostic markers in bladder cancer: a contemporary review
of the literature. J Urol 1998;160:645–59.
[3] Vlahou A, Schellhammer PF, Mendrinos S, Patel K, Kondylis FI,
Gong L, et al. Development of a novel proteomic approach for the
detection of transitional cell carcinoma of the bladder in urine.
American J Path 2001;158(4):1491–502.
[4] Lokeshwar VB, Soloway MS. Current bladder tumor tests: does their
projected utility fulfill clinical necessity? J Urol 2001;165:1067–77.
[5] Burchardt M, Burchardt T, Shabsigh A, De La Taille A, Benson MC,
Sawczuk I. Current concepts in biomarker technology for bladder
cancers. Clin Chem 2000;46:595–605.
[6] Jemal A, Thomas A, Murray T, Thun M. Cancer statistics. CA Cancer J
Clin 2002;52:23–47.
[7] Carr SA, Hemling ME, Bean MF, Roberts GD. Integration of mass
spec-trometry in analytical biotechnology. Anal Chem 1991;63:2802–
24.
[8] Loo JA, Brown J, Critchley G, Mitchell C, Andrews PC, Ogorzalek
Loo RR. High sensitivity mass spectrometric methods for obtaining
intact molecular weights from gel-separated proteins. Electrophoresis
1999;20:743–8.
[9] Zhang W, Czernik AJ, Yungwirth T, Aebersold R, Chait BT. Matrix-
assisted laser desorption mass spectrometric peptide mapping of
proteins separated by two-dimensional gel electrophoresis: determi-
nation of phosphorylation in synapsin I. Protein Sci 1994;3:677–86.
[10] von Eggeling F, Davies H, Lomas L, Fiedler W, Junker K, Claussen U,
et al. Tissue-Specific Microdissection Coupled with ProteinChip
Array Technologies: Applications in Cancer Research. Biotechniques
2000;29:1066–70.
[11] Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and
Bioinformatics Approaches for Idetification of Serum Biomarkers to
Detect Breast Cancer. Clin Chem 2002;48(8):1296–304.
[12] Weinberger SR, Dalmasso EA, Fung ET. Current achievements using
ProteinChip Array technology. Curr Opin Chem Biol 2002;6(1):86–
91.
[13] Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH,
et al. Serum protein fingerprinting coupled with a pattern-matching
algorithm distinguishes prostate cancer from benign prostate hyper-
plasia and healthy men. Cancer Res 2002;62:3609–14.
[14] Poon TC, Yip TT, Chan AT, Yip C, Yip V, Mok TS, et al. Compre-
hensive proteomic profiling identifies serum proteomic signatures for
detection of hepatocellular carcinama and its subtypes. Clin Chem
2003;49(5):752–60.
[15] Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF,
et al. Boosted decision tree analysis of surface-enhanced laser deso-
rption/ionization mass spectral serum profiles discriminates prostate
cancer from noncancer patients. Clin Chem 2002;48(10):1835–43.
[16] Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg
SM, et al. Use of proteomic patterns in serum to identify ovarian
cancer. Lancet 2002;359:572–7.
[17] Wright Jr GL, Cazares LH, Leung SM, Nasim S, Adam BL, Yip TT,
et al. Proteinchip surface enhanced laser desorption/ionization
(SELDI) mass spectrometry: a novel proteomic technology for detec-
tion of prostate cancer biomarkers in complex protein mixtures.
Prostate Cancer Prostatic Dis 1999;2:264–76.
[18] Xiao Z, Adam BL, Cazares LH, Clements MA, Davis JW, Schell-
hammer PF, et al. Quantitation of serum prostate-specific membrane
antigen by a novel protein biochip immunoassay discriminates benign
from malignant prostate disease. Cancer Res 2001;61:6029–33.
[19] Han M, Schoenberg MP. The use of molecular diagnostics in bladder
cancer. Urol Oncol 2000;5:87–92.