supplement mds vpages2 gim-2 u-tests carried out in mathematica 9.03, ... 5-fold cross-validation...
TRANSCRIPT
1
SUPPLEMENTARY INFORMATION
Specific Plasma Autoantibody Reactivity in Myelodysplastic Syndromes
George I Mias1, 4, Rui Chen1, 4, Yan Zhang2, Kunju Sridhar3, Donald Sharon1, Li Xiao2, Hogune Im1,
Michael P Snyder1, 5, and Peter L Greenberg3, 5.
1Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
2Hematology, Jiaotong University, 6th Hospital, Shanghai, China
3Hematology, Stanford University School of Medicine, Stanford, California 94305
4These authors contributed equally to this work.
5 M.P.S. and P.L.G. are both senior authors.
Correspondence should be addressed to [email protected] (P.L.G.), [email protected] (M.P.S.)
2
TABLE OF CONTENTS
.............................................................................Supplementary Method Details! 3
.....................................................................................................ProtoArray Analysis! 3
.........................................................................................Data and Array Information! 5
..........................................................................................Supplementary Figures! 6
Supplementary Figure S1. High Priority Proteins Functions and Cellular ........................................................................................................................Location! 6
....Supplementary Figure S2. Aggregate Signal Profiles for Two Channel Arrays! 7
...........................................................................................Supplementary Tables! 8
Supplementary Table S1. High Priority Proteins (Increased Reactivity in ........................................................................................................................Patients)! 8
.......................Supplementary Table S2. Bio Functions for High Priority Proteins! 9
..........Supplementary Table S3. Canonical Pathways for High Priority Proteins! 10
................................Supplementary Table S4. Additional Classification Results! 11
...................Supplementary Table S5. MDS/AML/Healthy Classification Results! 12
.............Supplementary Table S6. s-MDS/Healthy Linear Discriminant Analysis! 13
3
Supplementary Method Details
ProtoArray Analysis
As discussed in the main text, the ProtoArrays’ signal analysis in both Stage I and Stage II of
the investigation involved multi-step processing. In both steps, aggregate raw signals
(Supplementary Fig. S1) were normalized within each array, using an implementation of the
ProCAT algorithm1. Briefly, the algorithm was utilized with parameters implementing a sliding
window of length 15, to take into account local background subtraction as well as local
intensity normalization across each array. The normalized signal for each position i,j on the
protein array is calculated using the sliding window signals and backgrounds as,
signal!
ij =max Mij + (signalij − bS ,ij −Mij )MAbsDev ijMAbsDevij
,min(non-zero signal)⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪, where,
Mij =Mean({window foreground signals}) , (foreground refers to either 635 nm wavelength for
GST tag signal, or 532 nm wavelength for IgG reactivity signal), Sij ∈ {window signals} ,
MAbsDev ij = Mean({|Sij −Mij |}) , is the mean absolute deviation of the foreground window
signals from the mean of the local sliding window, and bS ,ij ∈ {local smoothed background} is
the smoothed background of the local window. The smoothed background averages the local
immediate background surrounding the spot of interest. Negative low signals (due to signal
being smaller than background) are replaced by the minimum detectable signal value. The
subsequent quantile normalization of the ProCAT normed arrays puts all distributions on the
same scale based on rank, with ties replaced by the average of the values2. After Mann
Whitney U-tests carried out in Mathematica 9.03, and multiple hypothesis correction, a set of
4
35 proteins (Supplementary Table S1) were selected as high-priority (p < 0.01 Bonferroni
corrected). Proteins displaying statistically significant signals (p < 0.01, Bonferroni corrected)
in the GST signal (635 nm channel) were eliminated, as this suggested protein level
variabilities in array printing which affected the results. The 35 proteins of interest were
analyzed for significant enrichment in bio functions (Supplementary Table S2) and canonical
pathways (Supplementary Table S3), using IPA (http://www.ingenuity.com, Ingenuity®
Systems). Various associations are shown schematically in Supplementary Fig. S1.
The normalized signals in the custom protein arrays (Stage II of investigation), were fitted to
roughly normal distributions using a power transformation, implemented per protein. Namely,
each signal is scaled by fitting λ so that each adjusted protein signal, pi is
pi(λ ) =piλ −1λ
,if λ ≠ 0,
log(pi ),if λ=0.
⎧
⎨⎪
⎩⎪
, with λ being continuous, i.e. a Box-Cox transformation4.
After the transformations, Analysis of Variance (ANOVA) was carried out for each protein signal
in the arrays as discussed in the main text, to search for robust prognostic-specific increased
autoantibody reactivity in patient subgroups through exclusion of signals that showed
significant duplicate variability and any associated effect interactions. Additionally the GST
signal was used to exclude signals showing significant protein level variability, by eliminating
any protein showing statistically significant group differences in the the GST (red, 635nm)
signals. Based on the filtered ANOVA5 results, at Bonferroni corrected p < 0.01, the
standardized normalized levels of AKT3, ARL8B, FCGR3A signals were then selected to
represent a three-dimensional point per subject, which was used for the different classifications
implemented in R6, as presented in Figures 4 and Supplementary Tables S4-S6. The results
5
from Linear Discriminant Analysis (LDA), from package MASS7 to classify s-MDS and Healthy
patients, with 5-fold cross-validation, with 1000 random group partitioning repetitions are
shown in Supplementary Table S6. Supplementary Kernel Discriminant Analysis (KDA; see
references for detailed method description8-‐11) for aggregate MDS, retrospective MDS, and
IPSS classification, using package ks9, with unconstrained smoothed cross-validation method
for bandwidth selection, are also shown. 5-fold cross-validation was used in assignment of
KDA classes, and median classifications and standard deviations of 1,000 random data
partitions were computed.
Data and Array Information
Raw data and corresponding normalized matrices were deposited in the Gene Expression
Omnibus repository under the following accessions:
Super-series
GSE48155! Protein Array Screening of Myelodysplastic Syndromes
Sub-Series:
GSE48153! Protein Array Screening of Myelodysplastic Syndromes I
GSE48154! Protein Array Screening of Myelodysplastic Syndromes II
Stage II focused array layoutGPL17321! ProtoArray Custom Service
6
Supplementary Figures
Biological FunctionsFx:
CP: Canonical Pathway
Stage II Validated
Cytoplasm
Plasma Membrane
Extracellular Space
Nucleus
uknown location
AKT3
FCGR3A
ARL8B
Supplementary Figure S1. High Priority Proteins Functions and Cellular Location
The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems,
http://www.ingenuity.com). Various functional associations were found, as shown above. The
three proteins with validated Stage II reactivity are show in blue (AKT3, FCGR3A, ARL8B).
7
a
c
2.0 10 6 4.0 106 6.0 106 8.0 106 1.0 107 1.2 1070
20
40
60
80 Background 635
Intensity Sum (A.U.)2 106 4 106 6 106 8 106 1 107
0
20
40
60
80
Intensity Sum (A.U.)
Foreground 635
Background 532
0 1 108 2 108 3 108 4 108 5 1080
10
20
30
40
50
60
b
Intensity Sum (A.U.)5.0 107 1.0 108 1.5 108 2.0 108
0
10
20
30
40
50
Healthys-MDSt-MDSL
N
Intensity Sum (A.U.)
NN
Foreground 532
Intensity Sum (A.U.)
Focused Arrays Channel Intensities
Intensity Sum (A.U.)
N
N
N
d
Focused ArraysForeground 532
Healthys-MDSt-MDSL
2 106 4 106 6 106 8 1060
50
100
150
200
Healthys-MDSt-MDSL
0 2 106 4 106 6 106 8 1060
200
400
600
800
1000
1200
532 Foreground532 Background635 Foreground635 Background
Supplementary Figure S2. Aggregate Signal Profiles for Two Channel Arrays
The ProtoArrays display overall higher foreground (a) than background signals (b) for both channels
(532 nm and 635 nm), with similar distributions across various subgroups. Similarly for the focused
arrays (c) and (d). Note that GST signal is generally higher than IgG signals (c).
8
Supplementary Tables
Supplementary Table S1. High Priority Proteins (Increased Reactivity in Patients)
Official Gene
Symbolp-value* Accession Group Stage II
Array
ABAT 2.49E-10 BC015628 s-MDS YAKT3 7.36E-10 NM_005465 s-MDS YALDOB 1.84E-07 BC029399 s-MDS YARL8B 6.95E-11 NM_018184 s-MDS YBARHL1 1.22E-07 NM_020064 s-MDS NC11orf88 3.58E-14 NM_207430 s-MDS NC6orf174 1.53E-08 NM_014702 s-MDS NCENPO 1.08E-07 NM_024322 AML YCKAP2 4.15E-08 BC018749 s-MDS NCRELD1 6.64E-08 BC008720 s-MDS YCRELD1 1.55E-07 BC008720 AML YDLEU1 5.17E-08 BC020692 t-MDS YDLEU1 4.20E-07 BC020692 s-MDS YDNAJB9 4.99E-08 NM_012328 s-MDS YEEF1A1 4.76E-08 BC094687 s-MDS YFCGR3A 3.05E-07 BC017865 t-MDS YFCGR3A 4.02E-07 BC036723 s-MDS YFGF16 6.74E-10 NM_003868 t-MDS YFGF16 4.74E-08 NM_003868 AML YFKBP14 1.14E-07 NM_017946 s-MDS YGNAZ 1.28E-08 BC037333 s-MDS YLGALS1 1.80E-11 NM_002305 s-MDS Y
Official Gene
Symbolp-value Accession Group Stage II
Array
LGALS1 6.49E-10 NM_002305 t-MDS YLGALS1 5.37E-08 NM_002305 AML YLRAT 1.21E-09 BC031053 s-MDS YMECR 3.14E-11 NM_016011 s-MDS YNEK6 3.00E-07 NM_014397 t-MDS NNUAK2 7.44E-08 NM_030952 s-MDS NPANK3 1.94E-07 NM_024594 s-MDS YPARP11 1.68E-07 BC031073 s-MDS YPLK1 2.75E-09 NM_005030 s-MDS YPPIG 1.21E-08 BC001555 s-MDS YPTAFR 5.45E-12 NM_014280 s-MDS NPTCD2 1.56E-17 NM_024754 s-MDS NPTCD2 2.86E-13 NM_024754 t-MDS NPTCD2 6.85E-13 NM_024754 AML NSERAC1 4.34E-07 NM_032861 s-MDS YSSX5 1.72E-07 BC016640 s-MDS NTMEM106A2.58E-07 NM_145041 s-MDS YTOMM20 6.87E-11 NM_014765 t-MDS YTRH 2.22E-08 NM_007117 t-MDS NVRK3 2.67E-08 NM_016440 s-MDS YZNF684 2.67E-07 NM_152373 s-MDS Y
*Proteins selected based on p-value from Stage I as highly reactive in patent Groups compared to healthy individuals. 25 of the high-priority proteins were successfully spotted on Stage II arrays.
9
Supplementary Table S2. Bio Functions for High Priority Proteins
bio Functions Functions Annotation p-Value Molecules*
Number of
MoleculesCell Death and Survival
Cell Death and Survival
Infectious Disease
Cell Death and Survival
Cancer
Reproductive System DiseaseCarbohydrate Metabolism
Cancer
Post-Translational ModificationMolecular TransportSmall Molecule BiochemistryLipid MetabolismCancer
Hematological Disease
Cancer
CancerSmall Molecule BiochemistryLipid Metabolism
CancerCell CycleCarbohydrate Metabolism
CancerCancer
Reproductive System Disease
apoptosis 7.30E-03
AKT3,BARHL1,CKAP2,DNAJB9, EEF1A1,FCGR3A,LGALS1,NEK6, NUAK2,PLK1,PTAFR,TRH 12
necrosis 3.38E-02
AKT3,BARHL1,CKAP2,EEF1A1, FCGR3A,LGALS1,NEK6,PLK1, PTAFR,TRH 10
Viral Infection 2.45E-02EEF1A1,FCGR3A,LGALS1,LRAT, PANK3,PARP11, PLK1,PTAFR 8
apoptosis of tumor cell lines 3.10E-02
AKT3,CKAP2,LGALS1,NEK6,PLK1, PTAFR 6
genital tumor 4.71E-02BARHL1,FCGR3A,LGALS1,PLK1, PTAFR,TMEM106A 6
genital tumor 4.71E-02BARHL1,FCGR3A,LGALS1,PLK1, PTAFR,TMEM106A 6
metabolism of carbohydrate 7.98E-03 ALDOB,EEF1A1,LRAT,PTAFR,TRH 5head and neck cancer 1.30E-02 AKT3,ALDOB,BARHL1,LGALS1,NEK6 5phosphorylation of protein 1.47E-02 AKT3,LGALS1,NEK6,NUAK2,PLK1 5concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5
concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5hematological neoplasia 3.89E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5hematological neoplasia 3.89E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5lymphohematopoietic cancer 4.34E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5uterine tumor 4.63E-02 AKT3,PLK1,PPIG,PTAFR,TMEM106A 5metabolism of phospholipid 1.01E-03 EEF1A1,LRAT,PTAFR,TRH 4metabolism of phospholipid 1.01E-03 EEF1A1,LRAT,PTAFR,TRH 4growth of tumor 8.25E-03 AKT3,FCGR3A,LGALS1,PLK1 4mitosis 1.34E-02 CKAP2,FGF16,NEK6,PLK1 4quantity of carbohydrate 1.79E-02 EEF1A1,GNAZ,NUAK2,TRH 4melanoma 1.97E-02 ABAT,AKT3,NEK6,PLK1 4endometrial carcinoma 4.02E-02 AKT3,PLK1,PTAFR,TMEM106A 4endometrial carcinoma 4.02E-02 AKT3,PLK1,PTAFR,TMEM106A 4
*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.
10
Supplementary Table S3. Canonical Pathways for High Priority Proteins
Ingenuity Canonical Pathways -log(p-value) Ratio Molecules*Role of NFAT in Regulation of the Immune Response
TR/RXR ActivationFGF SignalingG Beta Gamma SignalingFcγ Receptor-mediated Phagocytosis in Macrophages and MonocytesNatural Killer Cell SignalingRelaxin SignalingCXCR4 SignalingRAR ActivationCREB Signaling in NeuronsEphrin Receptor SignalingDendritic Cell MaturationRegulation of the Epithelial-Mesenchymal Transition PathwayThrombin SignalingSystemic Lupus Erythematosus SignalingRole of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid ArthritisMolecular Mechanisms of CancerAxonal Guidance Signaling
2.49 1.51E-02AKT3,GNAZ, FCGR3A
2.02 2.08E-02 TRH,AKT32.01 2.17E-02 FGF16,AKT31.98 1.69E-02 AKT3,GNAZ
1.94 1.96E-02 AKT3,FCGR3A1.80 1.71E-02 AKT3,FCGR3A1.62 1.23E-02 AKT3,GNAZ1.53 1.18E-02 AKT3,GNAZ1.43 1.05E-02 LRAT,AKT31.43 9.71E-03 AKT3,GNAZ1.43 9.85E-03 AKT3,GNAZ1.41 9.57E-03 AKT3,FCGR3A1.40 1.04E-02 FGF16,AKT31.36 9.62E-03 AKT3,GNAZ1.24 8.00E-03 AKT3,FCGR3A
1.00 5.95E-03 AKT3,FCGR3A0.92 5.25E-03 AKT3,GNAZ0.76 4.25E-03 AKT3,GNAZ
*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.
11
Supplementary Table S4. Additional Classification Results
(a) s-MDS/Healthy Classification Deviations*
σ Healthy Est
s-MDS Est
Healthy 2.0 2.0s-MDS 1.9 1.9Total 2.7 2.7
(b) Full Retrospective MDS Classification Deviations*
σ HealthyEst
s-MDS Est
t-MDS Est
L Est
Healthy 2.80 2.75 0.63 0.09s-MDS 3.05 3.13 0.83 0.14t-MDS 2.52 2.63 2.96 0.23L 2.29 2.89 1.36 2.36Total 6.27 6.68 4.20 2.44
(c) IPSS MDS Classification Deviations*
σ none Est
Low Est
Int1 Est
Int2 Est
High Est
L Est
Healthy 0.62 0.15 0.51 0.27 0.14 0.13Low 2.27 2.45 1.86 0.96 0.44 0.80Int1 2.93 0.77 3.24 1.04 0.44 0.99Int2 2.11 0.56 1.46 2.28 0.29 0.60High 1.41 0.67 1.22 0.67 1.31 0.65L 2.22 0.95 2.16 1.13 0.47 2.66Total 5.66 3.74 6.07 4.02 2.10 3.72
* Various classification results were based on using the three protein reactivities validated in Stage II, AKT3, FCGR3A and
ARL8B, as defining coordinates per sample. All classifications used 5-fold cross-validation, with 1,000 random group
partitioning repetitions. Based on these, the standard deviations for the Kernel Discriminant Analysis (KDA) classification
medians shown in Fig. 4a-c (for s-MDS/Healthy, retrospective MDS/Healthy and IPSS/Healthy) are shown in (a)-(c)
respectively.
12
Supplementary Table S5. MDS/AML/Healthy Classification Results
(a) Classification Medians*HealthyEst (%)
MDS Est (%)
L Est (%) Total
Healthy 86 (77) 26 (23) 0 (0) 112MDS 10 (6) 151 (93) 0 (0) 161L 8 (19) 24 (56) 11 (26) 43Total 104 202 11 316
(b) Classification Deviations*
σ Healthy Est
MDS Est
L Est
Healthy 2.71 2.71 0.03MDS 3.65 3.65 0.11L 2.08 2.60 2.13Total 5.71 5.82 2.16
*Results for classification of MDS set considered as a whole, AML (L) and Healthy, with median classifications shown in (a)
and corresponding standard deviations shown in (b). Kernel Discriminant Analysis classification results were based on using
the three protein reactivities validated in Stage II, AKT3, FCGR3A and ARL8B, as defining coordinates per sample.
Classifications used 5-fold cross-validation, with medians/deviations based on 1,000 random group partitioning repetitions.
13
Supplementary Table S6. s-MDS/Healthy Linear Discriminant Analysis
(a) Classification Medians*Healthy Est (%)
s-MDS Est (%) Total
Healthy 70 (62.5) 42 (37.5) 112s-MDS 41 (34) 78 (66) 119Total 111 130 231
(b) Classification Deviations*
σ Healthy Est
s-MDS Est
Healthy 2.0 2.0s-MDS 1.9 1.9Total 2.7 2.7
* Linear Discriminant Analysis (LDA) classification results for s-MDS and Healthy, with median classifications shown in (a) and
corresponding standard deviations shown in (b). LDA results were based on using the three protein reactivities validated in
Stage II, AKT3, FCGR3A and ARL8B, as defining coordinates per sample. Classification used 5-fold cross-validation, with
medians/deviations based on 1,000 random group partitioning repetitions.
14
Bibliography
1! Zhu, X., Gerstein, M. & Snyder, M. ProCAT: a data analysis approach for protein microarrays.
Genome Biol. 7, R110, doi:10.1186/gb-2006-7-11-r110 (2006).
2! Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods
for high density oligonucleotide array data based on variance and bias. Bioinformatics 19,
185-193 (2003).
3! Wolfram Research Inc. Mathematica, Version 9.0. (Wolfram Research, Inc., 2013).
4! Box, G. E. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Ser. B Stat. Methodol.
26, 211-252 (1964).
5! Pavlidis, P. Using ANOVA for gene selection from microarray studies of the nervous system.
Methods 31, 282-289 (2003).
6! R: A Language and Environment for Statistical Computing (R Foundation for Statistical
Computing, Vienna, Austria, 2013).
7! Venables, W. N., Ripley, B. D. & Venables, W. Modern applied statistics with S-PLUS. Vol. 250
(Springer-Verlag New York, 1994).
8! Baudat, G. & Anouar, F. E. Generalized discriminant analysis using a kernel approach. Neural
Comput. 12, 2385-2404, doi:Doi 10.1162/089976600300014980 (2000).
9! Duong, T. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in
R. J. Stat. Softw. 21, 1-16 (2007).
10! Li, Y. M., Gong, S. G. & Liddell, H. Recognising trajectories of facial identities using kernel
discriminant analysis. Image Vision Comput. 21, 1077-1086, doi:Doi 10.1016/J.Imavis.
2003.08.01 (2003).
11! Mika, S., Ratsch, G., Weston, J., Scholkopf, B. & Mullers, K. Fisher discriminant analysis with
kernels. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal
Processing Society Workshop., 41-48, doi:10.1109/NNSP.1999.788121 (1999).