mmg991 session 9 · 2001-11-09 · mmg991 session 9 • classical multidimensional scaling –...
TRANSCRIPT
MMG991 Session 9• Classical multidimensional scaling
– Concepts– S-Plus implementation– Microarrays
• Looking at Khan’s cancer data– Unanswered questions
• Some thoughts on data filtration– Are there alternative solutions?
• Compare the output– Classification of cancers– Selection of genes
• Binary recursive partitioning– Chapter 10 in MASS– Zhang et al., PNAS 98: 6730 - 6735
Projects– Updates from each group
Multidimensional scaling
• An ordination technique– Represent data in lower dimensional space
• Seeks to reduce spatial distortion– Require distance matrix as input
• Size limitations on input– Output
• 2-D or 3D plots– Visual assessment of relationships– No classification produced
• S-Plus implementation– Classical multidimensional scaling
• Equivalent to PCA when Euclidean distances are used– cmdscale(d, k=2, eig=F, add=F)
• d – distance matrix• k – number of output dimensions• eig – vector of k eigenvalues• add – additive constant
The data set
• Tumor classification/diagnostic prediction– http://nhgri.nih.gov– Khan, et al. Nature Medicine 7: 673
• The dataset– 63 training samples/25 test samples– four tumor/cell types
• EWS– 13 tumors/10 cell lines
• BL– 8 cell lines
• NB– 12 cell lines
• RMS– 10 tumors/10 cell lines
– Filtering the data• Minimum red intensity of 20*
– Relative red index• rri = mean spot intensity/mean intensity of filtered genes• Expression measured as ln(rri)
– Clustering and MDS• As defined in Khan et al, Cancer Research 58:5009
– “…highly expressed compared to reference probe.”
Khan’s solution• Setting up the model
nhgri<-supplemental.data[, -c(1:2)]gene.list<-match(ann.genes[,2], supplemental.data[,1])nhgri.small<-log(nhgri[gene.list,])
• Estimating the distancesnhgri.small.cor<-1-cor(nhgri.small)nhgri.small.tcor<-1-cor(t(nhgri.small))
• Clusteringnhgri.small.clust<-hclust(nhgri.small.cor, met="ave")nhgri.small.clust<-clorder(nhgri.small.clust,
apply(nhgri.small, 2, mean))nhgri.small.tclust<-hclust(nhgri.small.tcor, met="ave")nhgri.small.tclust<-clorder(nhgri.small.tclust,
apply(t(nhgri.small), 2, mean))plclust(nhgri.small.clust,
labels=dimnames(nhgri.small)[[2]], cex=0.6)plclust(nhgri.small.tclust, cex=0.6)
Kahn’s solution (continued)
• The heat maptemp<-nhgri.small[nhgri.small.tclust$order,nhgri.small.clust$order]image(list(x=1:dim(temp)[1], y=1:dim(temp)[2],
z=as.matrix((temp))))image.legend(as.matrix((temp)), x=nrow(temp)*1.075,
y=ncol(temp)*1.05, size=c(.125, 6.1), hor=F,cex=0.66, tck=-0.01, mgp=c(0,0.5,0))
• Multidimensional scalingtemp.mds<-cmdscale(dist(t(temp), met="man"), add=T)par(pty="s")plot(temp.mds$points[,1], temp.mds$points[,2])points(temp.mds$points[ews,1],temp.mds$points[ews,2], col=2)points(temp.mds$points[bl,1],temp.mds$points[bl,2], col=3)points(temp.mds$points[nb,1],temp.mds$points[nb,2], col=4)points(temp.mds$points[rms,1],temp.mds$points[rms,2], col=5)temp.dist<-dist(t(temp))mds.dist<-dist(temp.mds$points)
stress = sum((temp.dist - mds.dist)^2)/sum(temp.dist^2)
EW
S.T
1
EW
S.T
2
EW
S.T
3
EW
S.T
4E
WS
.T6
EW
S.T
7E
WS
.T9
EW
S.T
11E
WS
.T12
EW
S.T
13
EW
S.T
14E
WS
.T15
EW
S.T
19
EW
S.C
8
EW
S.C
3E
WS
.C2
EW
S.C
4
EW
S.C
6E
WS
.C9
EW
S.C
7E
WS
.C1
EW
S.C
11E
WS
.C10
BL.
C5
BL.
C6
BL.
C7
BL.
C8
BL.
C1
BL.
C2
BL.
C3
BL.
C4
NB
.C1
NB
.C2
NB
.C3
NB
.C6
NB
.C12
NB
.C7
NB
.C4
NB
.C5
NB
.C10
NB
.C11
NB
.C9
NB
.C8
RM
S.C
4
RM
S.C
3R
MS
.C9
RM
S.C
2
RM
S.C
5
RM
S.C
6
RM
S.C
7 RM
S.C
8
RM
S.C
10 RM
S.C
11
RM
S.T
1
RM
S.T
4
RM
S.T
2
RM
S.T
6
RM
S.T
7
RM
S.T
8RM
S.T
5RM
S.T
3
RM
S.T
10RM
S.T
11
TE
ST
.9
TE
ST
.11
TE
ST
.5
TE
ST
.8
TE
ST
.10
TE
ST
.13
TE
ST
.3
TE
ST
.1
TE
ST
.2
TE
ST
.4
TE
ST
.7
TE
ST
.12
TE
ST
.24
TE
ST
.6
TE
ST
.21
TE
ST
.20
TE
ST
.17T
ES
T.1
8
TE
ST
.22
TE
ST
.16 T
ES
T.2
3T
ES
T.1
4
TE
ST
.25
TE
ST
.15
TE
ST
.19
0.0
0.2
0.4
0.6
0.8
1.0
Clustering of experiments
Clustering of genes
1 2
3
4
5
6
7
8
9
10
11
1213
14
15 16
17
18
19
20
2122
23
24
25
26 27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 49
50
51
52
53
54
55
56
57
58
59
60
61
62
6364
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
8182
83
84
85
86
87
88
89
90
91
92
93
94
95
96
0.0
0.2
0.4
0.6
0.8
1.0
The heat map
0 20 40 60 80 100
020
4060
80
-4-2
02
Multidimensional scaling
temp.mds$points[, 1]
tem
p.m
ds$p
oint
s[, 2
]
-100 -50 0 50
-80
-60
-40
-20
020
4060
Stress = 21.06991
Observations and comments
• “Solution” doesn’t exactly agree with Khan’s– Comparing the output
• Heat maps similar• Plots of experiments similar
– Source of differences• Experimental noise• Scaling of “experiments
• Setting up the modelnhgri<-supplemental.data[, -c(1:2)]gene.list<-match(ann.genes[,2], supplemental.data[,1])
nhgri.small<-log(nhgri[gene.list,])nhgri.small<-apply(nhgri.small, 2, scale)
Clustering of experiments
EW
S.T
1
EW
S.T
2
EW
S.T
3
EW
S.T
4
EW
S.T
6
EW
S.T
7E
WS
.T9
EW
S.T
11E
WS
.T12
EW
S.T
13
EW
S.T
14E
WS
.T15
EW
S.T
19
EW
S.C
8
EW
S.C
3E
WS
.C2
EW
S.C
4
EW
S.C
6E
WS
.C9
EW
S.C
7
EW
S.C
1
EW
S.C
11E
WS
.C10
BL.
C5
BL.
C6
BL.
C7
BL.
C8
BL.
C1
BL.
C2
BL.
C3
BL.
C4
NB
.C1
NB
.C2
NB
.C3
NB
.C6
NB
.C12
NB
.C7
NB
.C4
NB
.C5
NB
.C10
NB
.C11
NB
.C9
NB
.C8
RM
S.C
4
RM
S.C
3R
MS
.C9
RM
S.C
2
RM
S.C
5
RM
S.C
6
RM
S.C
7 RM
S.C
8
RM
S.C
10 RM
S.C
11
RM
S.T
1
RM
S.T
4
RM
S.T
2
RM
S.T
6
RM
S.T
7
RM
S.T
8 RM
S.T
5 RM
S.T
3
RM
S.T
10 RM
S.T
11
TE
ST
.9
TE
ST
.11
TE
ST
.5
TE
ST
.8
TE
ST
.10
TE
ST
.13
TE
ST
.3
TE
ST
.1
TE
ST
.2
TE
ST
.4
TE
ST
.7
TE
ST
.12
TE
ST
.24
TE
ST
.6
TE
ST
.21
TE
ST
.20
TE
ST
.17 T
ES
T.1
8
TE
ST
.22
TE
ST
.16 T
ES
T.2
3
TE
ST
.14
TE
ST
.25
TE
ST
.15
TE
ST
.19
0.0
0.2
0.4
0.6
0.8
1.0
Clustering of genes
1 2
3
4
5
6
7
8
910
11
1213
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
666768
69
70
71
72
73
74
7576
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
0.0
0.2
0.4
0.6
0.8
1.0
The heat map
0 20 40 60 80 100
020
4060
80
-20
2
Multidimensional scaling
temp.mds$points[, 1]
tem
p.m
ds$p
oint
s[, 2
]
-40 -20 0 20 40 60
-60
-40
-20
020
4060
Stress = 16.82979
Some thoughts on filtering data
• Khan’s objective– “..highly expressed genes”– Need to identify subsets within the data
• Tumor/cell type• Search data frame for differentially expression
– Criteria• Arbitrary
– Difference in expression exceeds threshold value» Means, medians, trimmed means
• Population based– Subsets within the data groupings– Scaled vs. unscaled
Experiment 1• Setup
nhgri<-supplemental.data[, -c(1:2)]nhgri<-log(nhgri)
• Candidate genesgene.list<-NULLgene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -
apply(nhgri[,bl], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -
apply(nhgri[,nb], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -
apply(nhgri[,rms], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,bl], 1, median) -
apply(nhgri[,nb], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,bl], 1, median) -
apply(nhgri[,rms], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,nb], 1, median) -
apply(nhgri[,rms], 1, median))>2)gene.list<-unique(names(gene.list)[gene.list==T])
nhgri.small<-nhgri[gene.list,]
Clustering of experiments
EW
S.T
1
EW
S.T
2E
WS
.T3
EW
S.T
4
EW
S.T
6
EW
S.T
7E
WS
.T9
EW
S.T
11E
WS
.T12
EW
S.T
13
EW
S.T
14
EW
S.T
15
EW
S.T
19
EW
S.C
8
EW
S.C
3E
WS
.C2
EW
S.C
4
EW
S.C
6E
WS
.C9
EW
S.C
7
EW
S.C
1
EW
S.C
11E
WS
.C10
BL.
C5
BL.
C6
BL.
C7
BL.
C8
BL.
C1
BL.
C2
BL.
C3
BL.
C4 NB
.C1 NB
.C2
NB
.C3
NB
.C6
NB
.C12
NB
.C7
NB
.C4
NB
.C5
NB
.C10
NB
.C11
NB
.C9
NB
.C8R
MS
.C4
RM
S.C
3R
MS
.C9
RM
S.C
2
RM
S.C
5
RM
S.C
6
RM
S.C
7
RM
S.C
8
RM
S.C
10
RM
S.C
11
RM
S.T
1
RM
S.T
4
RM
S.T
2
RM
S.T
6
RM
S.T
7
RM
S.T
8
RM
S.T
5
RM
S.T
3
RM
S.T
10
RM
S.T
11
TE
ST
.9
TE
ST
.11
TE
ST
.5
TE
ST
.8
TE
ST
.10
TE
ST
.13
TE
ST
.3
TE
ST
.1
TE
ST
.2
TE
ST
.4
TE
ST
.7 TE
ST
.12
TE
ST
.24
TE
ST
.6
TE
ST
.21
TE
ST
.20
TE
ST
.17T
ES
T.1
8
TE
ST
.22
TE
ST
.16
TE
ST
.23
TE
ST
.14
TE
ST
.25
TE
ST
.15 T
ES
T.1
9
0.0
0.2
0.4
0.6
0.8
1.0
Clustering of genes
1
23
4 5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 20
21
22
23
24
25
2627
28
29
30
31
32
3334
35
36
3738
3940
41
4243
4445
46
4748
49
5051
0.0
0.2
0.4
0.6
0.8
1.0
The heat map
0 10 20 30 40 50
020
4060
80
-4-2
02
Multidimensional scaling
temp.mds$points[, 1]
tem
p.m
ds$p
oint
s[, 2
]
-80 -60 -40 -20 0 20 40
-40
-20
020
40
Stress = 20.26463
Experiment 2• Setup
ews.mat<-matrix(0, nrow(nhgri), 8)bl.mat<-matrix(0, nrow(nhgri), 8)nb.mat<-matrix(0, nrow(nhgri), 8)rms.mat<-matrix(0, nrow(nhgri), 8)for(i in 1:nrow(nhgri))
ews.mat[i,]<-rev(sort(as.matrix(nhgri[i,ews])))[1:8] for(i in 1:nrow(nhgri))
bl.mat[i,]<-rev(sort(as.matrix(nhgri[i,bl])))[1:8] for(i in 1:nrow(nhgri))
nb.mat[i,]<-rev(sort(as.matrix(nhgri[i,nb])))[1:8] for(i in 1:nrow(nhgri))
rms.mat[i,]<-rev(sort(as.matrix(nhgri[i,rms])))[1:8]
Experiment 2
• Candidate genesgene.list<-NULLgene.list<-c(gene.list, apply(ews.mat, 1, median) -
apply(bl.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(ews.mat, 1, median) -
apply(nb.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(ews.mat, 1, median) -
apply(rms.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(bl.mat, 1, median) -
apply(nb.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(bl.mat, 1, median) -
apply(rms.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(nb.mat, 1, median) -
apply(rms.mat, 1, median)>1.75)gene.list<-(unique(rep(1:nrow(nhgri),6)*gene.list))[-1]
nhgri.small<-nhgri[gene.list,]
Clustering of experiments
EW
S.T
1E
WS
.T2
EW
S.T
3
EW
S.T
4E
WS
.T6
EW
S.T
7
EW
S.T
9
EW
S.T
11E
WS
.T12
EW
S.T
13
EW
S.T
14E
WS
.T15
EW
S.T
19
EW
S.C
8
EW
S.C
3E
WS
.C2
EW
S.C
4
EW
S.C
6E
WS
.C9
EW
S.C
7
EW
S.C
1
EW
S.C
11E
WS
.C10BL.
C5
BL.
C6
BL.
C7
BL.
C8B
L.C
1B
L.C
2B
L.C
3B
L.C
4 NB
.C1
NB
.C2
NB
.C3
NB
.C6
NB
.C12
NB
.C7
NB
.C4
NB
.C5
NB
.C10
NB
.C11
NB
.C9
NB
.C8
RM
S.C
4
RM
S.C
3R
MS
.C9
RM
S.C
2
RM
S.C
5R
MS
.C6
RM
S.C
7
RM
S.C
8
RM
S.C
10
RM
S.C
11
RM
S.T
1
RM
S.T
4
RM
S.T
2 RM
S.T
6
RM
S.T
7
RM
S.T
8R
MS
.T5
RM
S.T
3
RM
S.T
10
RM
S.T
11
TE
ST
.9
TE
ST
.11T
ES
T.5
TE
ST
.8
TE
ST
.10
TE
ST
.13
TE
ST
.3
TE
ST
.1 TE
ST
.2
TE
ST
.4
TE
ST
.7
TE
ST
.12
TE
ST
.24
TE
ST
.6
TE
ST
.21
TE
ST
.20
TE
ST
.17
TE
ST
.18
TE
ST
.22
TE
ST
.16
TE
ST
.23
TE
ST
.14
TE
ST
.25TE
ST
.15
TE
ST
.19
0.0
0.2
0.4
0.6
0.8
Clustering of genes
33
84
88
10710
8
119
127
146
156
169
182
185
200
217
230
231
235
236
244
246
24724
8 251
256
257
266
281 32
6
354
368
380
412
43044
5
454
477
509
521
522
544
545
560 56
6
567
607
671
672
687
688
707
719
73178
8
800
819
831
847
850
851
905
933
951
973
1009
1066
1082 1085
1105
1124
1128
1159
1188
1208
1227
1235
1237
1246
1263
1281 1298
1319
1360
1389
1423
1427
1434
143714
86
1489
1494
1517
1524
1544
1570
1601
1608
1613
1626
1627
1645
1700
170817
111721 17
28
1736
1739
1750
1751
1766
177217
92
1795
1803
1808
1810
1828
1831
1834
1841
1886
1890
1954
1961
1975
1978
1980
1991
2022 20
32
2081
2099
2162
2166
2198
2199
2217
221922
23
2226
2231
2235
2253
2273
2290
2303
2304
145
151
34836
577
9
824
842
1915
2135
85
335
783
836
846
970 12
95
1916
1924
1884
1764
0.0
0.2
0.4
0.6
0.8
1.0
The heat map
0 50 100 150
020
4060
80
-4-2
02
Multidimensional scaling
Stress = 68.52684
temp.mds$points[, 1]
tem
p.m
ds$p
oint
s[, 2
]
-200 -100 0 100
-50
050
100
Experiment 3
• Data are scalednhgri<-supplemental.data[, -c(1:2)]nhgri<-apply(log(nhgri),2,scale)
Clustering of experiments
EW
S.T
1E
WS
.T2
EW
S.T
3
EW
S.T
4
EW
S.T
6
EW
S.T
7
EW
S.T
9
EW
S.T
11
EW
S.T
12
EW
S.T
13
EW
S.T
14
EW
S.T
15
EW
S.T
19
EW
S.C
8
EW
S.C
3E
WS
.C2
EW
S.C
4
EW
S.C
6E
WS
.C9
EW
S.C
7
EW
S.C
1
EW
S.C
11E
WS
.C10
BL.
C5
BL.
C6
BL.
C7
BL.
C8B
L.C
1
BL.
C2
BL.
C3
BL.
C4
NB
.C1
NB
.C2
NB
.C3
NB
.C6
NB
.C12
NB
.C7
NB
.C4
NB
.C5
NB
.C10
NB
.C11
NB
.C9
NB
.C8 RM
S.C
4
RM
S.C
3R
MS
.C9
RM
S.C
2
RM
S.C
5
RM
S.C
6
RM
S.C
7
RM
S.C
8
RM
S.C
10
RM
S.C
11
RM
S.T
1
RM
S.T
4
RM
S.T
2 RM
S.T
6 RM
S.T
7
RM
S.T
8RM
S.T
5
RM
S.T
3
RM
S.T
10
RM
S.T
11
TE
ST
.9
TE
ST
.11
TE
ST
.5
TE
ST
.8
TE
ST
.10
TE
ST
.13
TE
ST
.3
TE
ST
.1
TE
ST
.2
TE
ST
.4
TE
ST
.7
TE
ST
.12
TE
ST
.24
TE
ST
.6
TE
ST
.21
TE
ST
.20
TE
ST
.17
TE
ST
.18
TE
ST
.22
TE
ST
.16 T
ES
T.2
3
TE
ST
.14
TE
ST
.25
TE
ST
.15
TE
ST
.19
0.0
0.2
0.4
0.6
0.8
Clustering of genes
1
2
3
4
5
6
7
8
9
10 11
12
13
14
1516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
3233
34
35
36
37
3839
40
41
42
43 44
45
4647
48
49
50
51
52
53
54
55
56 57
58 59
60
61
62
63
64
65
66
67 68
69
70
71
72 73
74
75
76
77
78
79
80
81
82
8384
85
86 87
88
89
90
91
9293
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
0.0
0.2
0.4
0.6
0.8
1.0
The heat map
0 20 40 60 80 100
020
4060
80
-4-2
02
4
Multidimensional scaling
Stress = 100.181
temp.mds$points[, 1]
tem
p.m
ds$p
oint
s[, 2
]
-200 -100 0 100
-50
050
100
Evaluating the output
common.genes1<-intersect(gene.list1, gene.list)unique.genes1<-setdiff(gene.list1, gene.list)unique.genes.1<-setdiff(gene.list, gene.list1)common.genes2<-intersect(gene.list2, gene.list)unique.genes2<-setdiff(gene.list2, gene.list)unique.genes.2<-setdiff(gene.list, gene.list2)common.genes3<-intersect(gene.list3, gene.list)unique.genes3<-setdiff(gene.list3, gene.list)unique.genes.3<-setdiff(gene.list, gene.list3)
supplemental.data[common.genes1, 1:2]supplemental.data[unique.genes1, 1:2]supplemental.data[unique.genes.1, 1:2]supplemental.data[common.genes2, 1:2]supplemental.data[unique.genes2, 1:2]supplemental.data[unique.genes.2, 1:2]supplemental.data[common.genes3, 1:2]supplemental.data[unique.genes3, 1:2]supplemental.data[unique.genes.3, 1:2]
Clustering the tumor/cell types
• Cutting the treesmodel.1 <- cutree(nhgri.small.clust, h=0.6)model.1.tree <- nhgri.small.clustmodel.2 <- cutree(nhgri.small.clust, h=0.6)model.2.tree <- nhgri.small.clustmodel.3 <- cutree(nhgri.small.clust, h=0.6)model.3.tree <- nhgri.small.clustmodel.k <- cutree(nhgri.small.clust, h=0.6)model.k.tree <- nhgri.small.clustmodel.ks <- cutree(nhgri.small.clust, h=0.6)model.ks.tree <- nhgri.small.clust
• Identifying the groupsNhgri.groups <- cbind(dimnames(nhgri)[[2]], model.1, model.2,
model.3, model.k,, model.ks)
Comparison of classifications
> table(nhgri.groups[, 4], nhgri.groups[, 1])1 2 3 4 5 6
1 19 5 0 0 0 010 1 0 0 0 0 02 0 1 15 0 0 03 1 1 0 16 7 04 1 0 0 3 0 05 2 0 0 0 0 06 0 0 0 0 0 117 0 0 0 2 0 08 0 2 0 0 0 09 0 1 0 0 0 0
Comparison of classifications
> table(nhgri.groups[, 4], nhgri.groups[, 2])1 10 11 12 2 3 4 5 6 7 8 9
1 16 0 1 0 0 1 0 6 0 0 0 010 0 0 0 0 0 0 0 1 0 0 0 02 0 0 0 0 0 0 1 0 15 0 0 03 0 0 0 0 14 3 0 1 0 7 0 04 0 1 0 0 3 0 0 0 0 0 0 05 1 0 0 0 0 1 0 0 0 0 0 06 0 0 0 0 0 0 0 0 0 0 11 07 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 2 0 0 0 0 09 0 0 0 1 0 0 0 0 0 0 0 0
Comparison of classifications
> table(nhgri.groups[, 4], nhgri.groups[, 3])1 10 2 3 4 5 6 7 8 9
1 0 0 0 16 0 7 0 0 0 110 0 0 0 0 0 1 0 0 0 02 16 0 0 0 0 0 0 0 0 03 0 0 15 0 9 0 1 0 0 04 0 0 3 0 0 0 0 0 1 05 0 0 0 1 0 0 1 0 0 06 0 0 0 0 0 0 0 11 0 07 0 0 2 0 0 0 0 0 0 08 2 0 0 0 0 0 0 0 0 09 0 1 0 0 0 0 0 0 0 0
Comparison of classifications
> table(nhgri.groups[, 4], nhgri.groups[, 5])1 2 3 4 5 6 7
1 0 0 0 1 23 0 010 0 0 0 0 0 0 12 0 0 16 0 0 0 03 1 24 0 0 0 0 04 3 0 0 0 1 0 05 0 0 0 0 2 0 06 0 0 0 0 0 11 07 2 0 0 0 0 0 08 0 0 2 0 0 0 09 0 0 0 1 0 0 0
Model 1
0 10 20 30 40 50
020
4060
80
-4-2
02
Model 2
0 50 100 150
020
4060
80
-4-2
02
Model 3
0 20 40 60 80 100
020
4060
80
-4-2
02
4
Khan’s solution
0 20 40 60 80 100
020
4060
80
-4-2
02
Khan’s solution, scaled data
0 20 40 60 80 100
020
4060
80
-20
2
Some closing thoughts…
• There remain some unexplained differences– Not related to a simple transformation– Impact of noise in data on ANN warrants further investigation
• Multiple “solutions” yield a smaller set of “diagnostic” genes– 40-60% overlap with Khan’s solution– Additional genes that were not reported– Need a cancer biologist to review the significance
• All models could be easily refined– Adjustment of clustering thresholds
• Iterative model