supplement mds vpages2 gim-2 u-tests carried out in mathematica 9.03, ... 5-fold cross-validation...

14
1 SUPPLEMENTARY INFORMATION Specific Plasma Autoantibody Reactivity in Myelodysplastic Syndromes George I Mias 1, 4 , Rui Chen 1, 4 , Yan Zhang 2 , Kunju Sridhar 3 , Donald Sharon 1 , Li Xiao 2 , Hogune Im 1 , Michael P Snyder 1, 5 , and Peter L Greenberg 3, 5 . 1 Department of Genetics, Stanford University School of Medicine, Stanford, California, USA 2 Hematology, Jiaotong University, 6th Hospital, Shanghai, China 3 Hematology, Stanford University School of Medicine, Stanford, California 94305 4 These authors contributed equally to this work. 5 M.P.S. and P.L.G. are both senior authors. Correspondence should be addressed to [email protected] (P.L.G.), [email protected] (M.P.S.)

Upload: lamkiet

Post on 28-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1

SUPPLEMENTARY INFORMATION

Specific Plasma Autoantibody Reactivity in Myelodysplastic Syndromes

George I Mias1, 4, Rui Chen1, 4, Yan Zhang2, Kunju Sridhar3, Donald Sharon1, Li Xiao2, Hogune Im1,

Michael P Snyder1, 5, and Peter L Greenberg3, 5.

1Department of Genetics, Stanford University School of Medicine, Stanford, California, USA

2Hematology, Jiaotong University, 6th Hospital, Shanghai, China

3Hematology, Stanford University School of Medicine, Stanford, California 94305

4These authors contributed equally to this work.

5 M.P.S. and P.L.G. are both senior authors.

Correspondence should be addressed to [email protected] (P.L.G.), [email protected] (M.P.S.)

2

TABLE OF CONTENTS

.............................................................................Supplementary Method Details! 3

.....................................................................................................ProtoArray Analysis! 3

.........................................................................................Data and Array Information! 5

..........................................................................................Supplementary Figures! 6

Supplementary Figure S1. High Priority Proteins Functions and Cellular ........................................................................................................................Location! 6

....Supplementary Figure S2. Aggregate Signal Profiles for Two Channel Arrays! 7

...........................................................................................Supplementary Tables! 8

Supplementary Table S1. High Priority Proteins (Increased Reactivity in ........................................................................................................................Patients)! 8

.......................Supplementary Table S2. Bio Functions for High Priority Proteins! 9

..........Supplementary Table S3. Canonical Pathways for High Priority Proteins! 10

................................Supplementary Table S4. Additional Classification Results! 11

...................Supplementary Table S5. MDS/AML/Healthy Classification Results! 12

.............Supplementary Table S6. s-MDS/Healthy Linear Discriminant Analysis! 13

3

Supplementary Method Details

ProtoArray Analysis

As discussed in the main text, the ProtoArrays’ signal analysis in both Stage I and Stage II of

the investigation involved multi-step processing. In both steps, aggregate raw signals

(Supplementary Fig. S1) were normalized within each array, using an implementation of the

ProCAT algorithm1. Briefly, the algorithm was utilized with parameters implementing a sliding

window of length 15, to take into account local background subtraction as well as local

intensity normalization across each array. The normalized signal for each position i,j on the

protein array is calculated using the sliding window signals and backgrounds as,

signal!

ij =max Mij + (signalij − bS ,ij −Mij )MAbsDev ijMAbsDevij

,min(non-zero signal)⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪, where,

Mij =Mean({window foreground signals}) , (foreground refers to either 635 nm wavelength for

GST tag signal, or 532 nm wavelength for IgG reactivity signal), Sij ∈ {window signals} ,

MAbsDev ij = Mean({|Sij −Mij |}) , is the mean absolute deviation of the foreground window

signals from the mean of the local sliding window, and bS ,ij ∈ {local smoothed background} is

the smoothed background of the local window. The smoothed background averages the local

immediate background surrounding the spot of interest. Negative low signals (due to signal

being smaller than background) are replaced by the minimum detectable signal value. The

subsequent quantile normalization of the ProCAT normed arrays puts all distributions on the

same scale based on rank, with ties replaced by the average of the values2. After Mann

Whitney U-tests carried out in Mathematica 9.03, and multiple hypothesis correction, a set of

4

35 proteins (Supplementary Table S1) were selected as high-priority (p < 0.01 Bonferroni

corrected). Proteins displaying statistically significant signals (p < 0.01, Bonferroni corrected)

in the GST signal (635 nm channel) were eliminated, as this suggested protein level

variabilities in array printing which affected the results. The 35 proteins of interest were

analyzed for significant enrichment in bio functions (Supplementary Table S2) and canonical

pathways (Supplementary Table S3), using IPA (http://www.ingenuity.com, Ingenuity®

Systems). Various associations are shown schematically in Supplementary Fig. S1.

The normalized signals in the custom protein arrays (Stage II of investigation), were fitted to

roughly normal distributions using a power transformation, implemented per protein. Namely,

each signal is scaled by fitting λ so that each adjusted protein signal, pi is

pi(λ ) =piλ −1λ

,if λ ≠ 0,

log(pi ),if λ=0.

⎨⎪

⎩⎪

, with λ being continuous, i.e. a Box-Cox transformation4.

After the transformations, Analysis of Variance (ANOVA) was carried out for each protein signal

in the arrays as discussed in the main text, to search for robust prognostic-specific increased

autoantibody reactivity in patient subgroups through exclusion of signals that showed

significant duplicate variability and any associated effect interactions. Additionally the GST

signal was used to exclude signals showing significant protein level variability, by eliminating

any protein showing statistically significant group differences in the the GST (red, 635nm)

signals. Based on the filtered ANOVA5 results, at Bonferroni corrected p < 0.01, the

standardized normalized levels of AKT3, ARL8B, FCGR3A signals were then selected to

represent a three-dimensional point per subject, which was used for the different classifications

implemented in R6, as presented in Figures 4 and Supplementary Tables S4-S6. The results

5

from Linear Discriminant Analysis (LDA), from package MASS7 to classify s-MDS and Healthy

patients, with 5-fold cross-validation, with 1000 random group partitioning repetitions are

shown in Supplementary Table S6. Supplementary Kernel Discriminant Analysis (KDA; see

references for detailed method description8-­‐11) for aggregate MDS, retrospective MDS, and

IPSS classification, using package ks9, with unconstrained smoothed cross-validation method

for bandwidth selection, are also shown. 5-fold cross-validation was used in assignment of

KDA classes, and median classifications and standard deviations of 1,000 random data

partitions were computed.

Data and Array Information

Raw data and corresponding normalized matrices were deposited in the Gene Expression

Omnibus repository under the following accessions:

Super-series

GSE48155! Protein Array Screening of Myelodysplastic Syndromes

Sub-Series:

GSE48153! Protein Array Screening of Myelodysplastic Syndromes I

GSE48154! Protein Array Screening of Myelodysplastic Syndromes II

Stage II focused array layoutGPL17321! ProtoArray Custom Service     

6

Supplementary Figures

Biological FunctionsFx:

CP: Canonical Pathway

Stage II Validated

Cytoplasm

Plasma Membrane

Extracellular Space

Nucleus

uknown location

AKT3

FCGR3A

ARL8B

Supplementary Figure S1. High Priority Proteins Functions and Cellular Location

The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems,

http://www.ingenuity.com). Various functional associations were found, as shown above. The

three proteins with validated Stage II reactivity are show in blue (AKT3, FCGR3A, ARL8B).

7

a

c

2.0 10 6 4.0 106 6.0 106 8.0 106 1.0 107 1.2 1070

20

40

60

80 Background 635

Intensity Sum (A.U.)2 106 4 106 6 106 8 106 1 107

0

20

40

60

80

Intensity Sum (A.U.)

Foreground 635

Background 532

0 1 108 2 108 3 108 4 108 5 1080

10

20

30

40

50

60

b

Intensity Sum (A.U.)5.0 107 1.0 108 1.5 108 2.0 108

0

10

20

30

40

50

Healthys-MDSt-MDSL

N

Intensity Sum (A.U.)

NN

Foreground 532

Intensity Sum (A.U.)

Focused Arrays Channel Intensities

Intensity Sum (A.U.)

N

N

N

d

Focused ArraysForeground 532

Healthys-MDSt-MDSL

2 106 4 106 6 106 8 1060

50

100

150

200

Healthys-MDSt-MDSL

0 2 106 4 106 6 106 8 1060

200

400

600

800

1000

1200

532 Foreground532 Background635 Foreground635 Background

Supplementary Figure S2. Aggregate Signal Profiles for Two Channel Arrays

The ProtoArrays display overall higher foreground (a) than background signals (b) for both channels

(532 nm and 635 nm), with similar distributions across various subgroups. Similarly for the focused

arrays (c) and (d). Note that GST signal is generally higher than IgG signals (c).

8

Supplementary Tables

Supplementary Table S1. High Priority Proteins (Increased Reactivity in Patients)

Official Gene

Symbolp-value* Accession Group Stage II

Array

ABAT 2.49E-10 BC015628 s-MDS YAKT3 7.36E-10 NM_005465 s-MDS YALDOB 1.84E-07 BC029399 s-MDS YARL8B 6.95E-11 NM_018184 s-MDS YBARHL1 1.22E-07 NM_020064 s-MDS NC11orf88 3.58E-14 NM_207430 s-MDS NC6orf174 1.53E-08 NM_014702 s-MDS NCENPO 1.08E-07 NM_024322 AML YCKAP2 4.15E-08 BC018749 s-MDS NCRELD1 6.64E-08 BC008720 s-MDS YCRELD1 1.55E-07 BC008720 AML YDLEU1 5.17E-08 BC020692 t-MDS YDLEU1 4.20E-07 BC020692 s-MDS YDNAJB9 4.99E-08 NM_012328 s-MDS YEEF1A1 4.76E-08 BC094687 s-MDS YFCGR3A 3.05E-07 BC017865 t-MDS YFCGR3A 4.02E-07 BC036723 s-MDS YFGF16 6.74E-10 NM_003868 t-MDS YFGF16 4.74E-08 NM_003868 AML YFKBP14 1.14E-07 NM_017946 s-MDS YGNAZ 1.28E-08 BC037333 s-MDS YLGALS1 1.80E-11 NM_002305 s-MDS Y

Official Gene

Symbolp-value Accession Group Stage II

Array

LGALS1 6.49E-10 NM_002305 t-MDS YLGALS1 5.37E-08 NM_002305 AML YLRAT 1.21E-09 BC031053 s-MDS YMECR 3.14E-11 NM_016011 s-MDS YNEK6 3.00E-07 NM_014397 t-MDS NNUAK2 7.44E-08 NM_030952 s-MDS NPANK3 1.94E-07 NM_024594 s-MDS YPARP11 1.68E-07 BC031073 s-MDS YPLK1 2.75E-09 NM_005030 s-MDS YPPIG 1.21E-08 BC001555 s-MDS YPTAFR 5.45E-12 NM_014280 s-MDS NPTCD2 1.56E-17 NM_024754 s-MDS NPTCD2 2.86E-13 NM_024754 t-MDS NPTCD2 6.85E-13 NM_024754 AML NSERAC1 4.34E-07 NM_032861 s-MDS YSSX5 1.72E-07 BC016640 s-MDS NTMEM106A2.58E-07 NM_145041 s-MDS YTOMM20 6.87E-11 NM_014765 t-MDS YTRH 2.22E-08 NM_007117 t-MDS NVRK3 2.67E-08 NM_016440 s-MDS YZNF684 2.67E-07 NM_152373 s-MDS Y

*Proteins selected based on p-value from Stage I as highly reactive in patent Groups compared to healthy individuals. 25 of the high-priority proteins were successfully spotted on Stage II arrays.

9

Supplementary Table S2. Bio Functions for High Priority Proteins

bio Functions Functions Annotation p-Value Molecules*

Number of

MoleculesCell Death and Survival

Cell Death and Survival

Infectious Disease

Cell Death and Survival

Cancer

Reproductive System DiseaseCarbohydrate Metabolism

Cancer

Post-Translational ModificationMolecular TransportSmall Molecule BiochemistryLipid MetabolismCancer

Hematological Disease

Cancer

CancerSmall Molecule BiochemistryLipid Metabolism

CancerCell CycleCarbohydrate Metabolism

CancerCancer

Reproductive System Disease

apoptosis 7.30E-03

AKT3,BARHL1,CKAP2,DNAJB9, EEF1A1,FCGR3A,LGALS1,NEK6, NUAK2,PLK1,PTAFR,TRH 12

necrosis 3.38E-02

AKT3,BARHL1,CKAP2,EEF1A1, FCGR3A,LGALS1,NEK6,PLK1, PTAFR,TRH 10

Viral Infection 2.45E-02EEF1A1,FCGR3A,LGALS1,LRAT, PANK3,PARP11, PLK1,PTAFR 8

apoptosis of tumor cell lines 3.10E-02

AKT3,CKAP2,LGALS1,NEK6,PLK1, PTAFR 6

genital tumor 4.71E-02BARHL1,FCGR3A,LGALS1,PLK1, PTAFR,TMEM106A 6

genital tumor 4.71E-02BARHL1,FCGR3A,LGALS1,PLK1, PTAFR,TMEM106A 6

metabolism of carbohydrate 7.98E-03 ALDOB,EEF1A1,LRAT,PTAFR,TRH 5head and neck cancer 1.30E-02 AKT3,ALDOB,BARHL1,LGALS1,NEK6 5phosphorylation of protein 1.47E-02 AKT3,LGALS1,NEK6,NUAK2,PLK1 5concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5

concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5concentration of lipid 2.41E-02 AKT3,EEF1A1,LRAT,NUAK2,TRH 5hematological neoplasia 3.89E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5hematological neoplasia 3.89E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5lymphohematopoietic cancer 4.34E-02 AKT3,DLEU1,EEF1A1,FCGR3A,NEK6 5uterine tumor 4.63E-02 AKT3,PLK1,PPIG,PTAFR,TMEM106A 5metabolism of phospholipid 1.01E-03 EEF1A1,LRAT,PTAFR,TRH 4metabolism of phospholipid 1.01E-03 EEF1A1,LRAT,PTAFR,TRH 4growth of tumor 8.25E-03 AKT3,FCGR3A,LGALS1,PLK1 4mitosis 1.34E-02 CKAP2,FGF16,NEK6,PLK1 4quantity of carbohydrate 1.79E-02 EEF1A1,GNAZ,NUAK2,TRH 4melanoma 1.97E-02 ABAT,AKT3,NEK6,PLK1 4endometrial carcinoma 4.02E-02 AKT3,PLK1,PTAFR,TMEM106A 4endometrial carcinoma 4.02E-02 AKT3,PLK1,PTAFR,TMEM106A 4

*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.*Table truncated for number of molecules ≥ 4. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com), to identify enrichment in Bio-Functions.

10

Supplementary Table S3. Canonical Pathways for High Priority Proteins

Ingenuity Canonical Pathways -log(p-value) Ratio Molecules*Role of NFAT in Regulation of the Immune Response

TR/RXR ActivationFGF SignalingG Beta Gamma SignalingFcγ Receptor-mediated Phagocytosis in Macrophages and MonocytesNatural Killer Cell SignalingRelaxin SignalingCXCR4 SignalingRAR ActivationCREB Signaling in NeuronsEphrin Receptor SignalingDendritic Cell MaturationRegulation of the Epithelial-Mesenchymal Transition PathwayThrombin SignalingSystemic Lupus Erythematosus SignalingRole of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid ArthritisMolecular Mechanisms of CancerAxonal Guidance Signaling

2.49 1.51E-02AKT3,GNAZ, FCGR3A

2.02 2.08E-02 TRH,AKT32.01 2.17E-02 FGF16,AKT31.98 1.69E-02 AKT3,GNAZ

1.94 1.96E-02 AKT3,FCGR3A1.80 1.71E-02 AKT3,FCGR3A1.62 1.23E-02 AKT3,GNAZ1.53 1.18E-02 AKT3,GNAZ1.43 1.05E-02 LRAT,AKT31.43 9.71E-03 AKT3,GNAZ1.43 9.85E-03 AKT3,GNAZ1.41 9.57E-03 AKT3,FCGR3A1.40 1.04E-02 FGF16,AKT31.36 9.62E-03 AKT3,GNAZ1.24 8.00E-03 AKT3,FCGR3A

1.00 5.95E-03 AKT3,FCGR3A0.92 5.25E-03 AKT3,GNAZ0.76 4.25E-03 AKT3,GNAZ

*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.*Table truncated for number of molecules ≥ 3. The high-priority proteins were analyzed with IPA’s knowledge database (Ingenuity® Systems, http://www.ingenuity.com) to identify various Canonical Pathway associations.

11

Supplementary Table S4. Additional Classification Results

(a) s-MDS/Healthy Classification Deviations*

σ Healthy Est

s-MDS Est

Healthy 2.0 2.0s-MDS 1.9 1.9Total 2.7 2.7

(b) Full Retrospective MDS Classification Deviations*

σ HealthyEst

s-MDS Est

t-MDS Est

L Est

Healthy 2.80 2.75 0.63 0.09s-MDS 3.05 3.13 0.83 0.14t-MDS 2.52 2.63 2.96 0.23L 2.29 2.89 1.36 2.36Total 6.27 6.68 4.20 2.44

(c) IPSS MDS Classification Deviations*

σ none Est

Low Est

Int1 Est

Int2 Est

High Est

L Est

Healthy 0.62 0.15 0.51 0.27 0.14 0.13Low 2.27 2.45 1.86 0.96 0.44 0.80Int1 2.93 0.77 3.24 1.04 0.44 0.99Int2 2.11 0.56 1.46 2.28 0.29 0.60High 1.41 0.67 1.22 0.67 1.31 0.65L 2.22 0.95 2.16 1.13 0.47 2.66Total 5.66 3.74 6.07 4.02 2.10 3.72

* Various classification results were based on using the three protein reactivities validated in Stage II, AKT3, FCGR3A and

ARL8B, as defining coordinates per sample. All classifications used 5-fold cross-validation, with 1,000 random group

partitioning repetitions. Based on these, the standard deviations for the Kernel Discriminant Analysis (KDA) classification

medians shown in Fig. 4a-c (for s-MDS/Healthy, retrospective MDS/Healthy and IPSS/Healthy) are shown in (a)-(c)

respectively.

12

Supplementary Table S5. MDS/AML/Healthy Classification Results

(a) Classification Medians*HealthyEst (%)

MDS Est (%)

L Est (%) Total

Healthy 86 (77) 26 (23) 0 (0) 112MDS 10 (6) 151 (93) 0 (0) 161L 8 (19) 24 (56) 11 (26) 43Total 104 202 11 316

(b) Classification Deviations*

σ Healthy Est

MDS Est

L Est

Healthy 2.71 2.71 0.03MDS 3.65 3.65 0.11L 2.08 2.60 2.13Total 5.71 5.82 2.16

*Results for classification of MDS set considered as a whole, AML (L) and Healthy, with median classifications shown in (a)

and corresponding standard deviations shown in (b). Kernel Discriminant Analysis classification results were based on using

the three protein reactivities validated in Stage II, AKT3, FCGR3A and ARL8B, as defining coordinates per sample.

Classifications used 5-fold cross-validation, with medians/deviations based on 1,000 random group partitioning repetitions.

13

Supplementary Table S6. s-MDS/Healthy Linear Discriminant Analysis

(a) Classification Medians*Healthy Est (%)

s-MDS Est (%) Total

Healthy 70 (62.5) 42 (37.5) 112s-MDS 41 (34) 78 (66) 119Total 111 130 231

(b) Classification Deviations*

σ Healthy Est

s-MDS Est

Healthy 2.0 2.0s-MDS 1.9 1.9Total 2.7 2.7

* Linear Discriminant Analysis (LDA) classification results for s-MDS and Healthy, with median classifications shown in (a) and

corresponding standard deviations shown in (b). LDA results were based on using the three protein reactivities validated in

Stage II, AKT3, FCGR3A and ARL8B, as defining coordinates per sample. Classification used 5-fold cross-validation, with

medians/deviations based on 1,000 random group partitioning repetitions.

14

Bibliography

1! Zhu, X., Gerstein, M. & Snyder, M. ProCAT: a data analysis approach for protein microarrays.

Genome Biol. 7, R110, doi:10.1186/gb-2006-7-11-r110 (2006).

2! Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods

for high density oligonucleotide array data based on variance and bias. Bioinformatics 19,

185-193 (2003).

3! Wolfram Research Inc. Mathematica, Version 9.0. (Wolfram Research, Inc., 2013).

4! Box, G. E. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Ser. B Stat. Methodol.

26, 211-252 (1964).

5! Pavlidis, P. Using ANOVA for gene selection from microarray studies of the nervous system.

Methods 31, 282-289 (2003).

6! R: A Language and Environment for Statistical Computing (R Foundation for Statistical

Computing, Vienna, Austria, 2013).

7! Venables, W. N., Ripley, B. D. & Venables, W. Modern applied statistics with S-PLUS. Vol. 250

(Springer-Verlag New York, 1994).

8! Baudat, G. & Anouar, F. E. Generalized discriminant analysis using a kernel approach. Neural

Comput. 12, 2385-2404, doi:Doi 10.1162/089976600300014980 (2000).

9! Duong, T. ks: Kernel density estimation and kernel discriminant analysis for multivariate data in

R. J. Stat. Softw. 21, 1-16 (2007).

10! Li, Y. M., Gong, S. G. & Liddell, H. Recognising trajectories of facial identities using kernel

discriminant analysis. Image Vision Comput. 21, 1077-1086, doi:Doi 10.1016/J.Imavis.

2003.08.01 (2003).

11! Mika, S., Ratsch, G., Weston, J., Scholkopf, B. & Mullers, K. Fisher discriminant analysis with

kernels. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal

Processing Society Workshop., 41-48, doi:10.1109/NNSP.1999.788121 (1999).