mmg991 session 9 · 2001-11-09 · mmg991 session 9 • classical multidimensional scaling –...

45
MMG991 Session 9 Classical multidimensional scaling Concepts S-Plus implementation Microarrays Looking at Khan’s cancer data Unanswered questions Some thoughts on data filtration Are there alternative solutions? Compare the output Classification of cancers Selection of genes Binary recursive partitioning Chapter 10 in MASS Zhang et al., PNAS 98: 6730 - 6735 Projects Updates from each group

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

MMG991 Session 9• Classical multidimensional scaling

– Concepts– S-Plus implementation– Microarrays

• Looking at Khan’s cancer data– Unanswered questions

• Some thoughts on data filtration– Are there alternative solutions?

• Compare the output– Classification of cancers– Selection of genes

• Binary recursive partitioning– Chapter 10 in MASS– Zhang et al., PNAS 98: 6730 - 6735

Projects– Updates from each group

Page 2: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

• An ordination technique– Represent data in lower dimensional space

• Seeks to reduce spatial distortion– Require distance matrix as input

• Size limitations on input– Output

• 2-D or 3D plots– Visual assessment of relationships– No classification produced

• S-Plus implementation– Classical multidimensional scaling

• Equivalent to PCA when Euclidean distances are used– cmdscale(d, k=2, eig=F, add=F)

• d – distance matrix• k – number of output dimensions• eig – vector of k eigenvalues• add – additive constant

Page 3: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The data set

• Tumor classification/diagnostic prediction– http://nhgri.nih.gov– Khan, et al. Nature Medicine 7: 673

• The dataset– 63 training samples/25 test samples– four tumor/cell types

• EWS– 13 tumors/10 cell lines

• BL– 8 cell lines

• NB– 12 cell lines

• RMS– 10 tumors/10 cell lines

– Filtering the data• Minimum red intensity of 20*

– Relative red index• rri = mean spot intensity/mean intensity of filtered genes• Expression measured as ln(rri)

– Clustering and MDS• As defined in Khan et al, Cancer Research 58:5009

– “…highly expressed compared to reference probe.”

Page 4: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data
Page 5: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Khan’s solution• Setting up the model

nhgri<-supplemental.data[, -c(1:2)]gene.list<-match(ann.genes[,2], supplemental.data[,1])nhgri.small<-log(nhgri[gene.list,])

• Estimating the distancesnhgri.small.cor<-1-cor(nhgri.small)nhgri.small.tcor<-1-cor(t(nhgri.small))

• Clusteringnhgri.small.clust<-hclust(nhgri.small.cor, met="ave")nhgri.small.clust<-clorder(nhgri.small.clust,

apply(nhgri.small, 2, mean))nhgri.small.tclust<-hclust(nhgri.small.tcor, met="ave")nhgri.small.tclust<-clorder(nhgri.small.tclust,

apply(t(nhgri.small), 2, mean))plclust(nhgri.small.clust,

labels=dimnames(nhgri.small)[[2]], cex=0.6)plclust(nhgri.small.tclust, cex=0.6)

Page 6: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Kahn’s solution (continued)

• The heat maptemp<-nhgri.small[nhgri.small.tclust$order,nhgri.small.clust$order]image(list(x=1:dim(temp)[1], y=1:dim(temp)[2],

z=as.matrix((temp))))image.legend(as.matrix((temp)), x=nrow(temp)*1.075,

y=ncol(temp)*1.05, size=c(.125, 6.1), hor=F,cex=0.66, tck=-0.01, mgp=c(0,0.5,0))

• Multidimensional scalingtemp.mds<-cmdscale(dist(t(temp), met="man"), add=T)par(pty="s")plot(temp.mds$points[,1], temp.mds$points[,2])points(temp.mds$points[ews,1],temp.mds$points[ews,2], col=2)points(temp.mds$points[bl,1],temp.mds$points[bl,2], col=3)points(temp.mds$points[nb,1],temp.mds$points[nb,2], col=4)points(temp.mds$points[rms,1],temp.mds$points[rms,2], col=5)temp.dist<-dist(t(temp))mds.dist<-dist(temp.mds$points)

stress = sum((temp.dist - mds.dist)^2)/sum(temp.dist^2)

Page 7: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

EW

S.T

1

EW

S.T

2

EW

S.T

3

EW

S.T

4E

WS

.T6

EW

S.T

7E

WS

.T9

EW

S.T

11E

WS

.T12

EW

S.T

13

EW

S.T

14E

WS

.T15

EW

S.T

19

EW

S.C

8

EW

S.C

3E

WS

.C2

EW

S.C

4

EW

S.C

6E

WS

.C9

EW

S.C

7E

WS

.C1

EW

S.C

11E

WS

.C10

BL.

C5

BL.

C6

BL.

C7

BL.

C8

BL.

C1

BL.

C2

BL.

C3

BL.

C4

NB

.C1

NB

.C2

NB

.C3

NB

.C6

NB

.C12

NB

.C7

NB

.C4

NB

.C5

NB

.C10

NB

.C11

NB

.C9

NB

.C8

RM

S.C

4

RM

S.C

3R

MS

.C9

RM

S.C

2

RM

S.C

5

RM

S.C

6

RM

S.C

7 RM

S.C

8

RM

S.C

10 RM

S.C

11

RM

S.T

1

RM

S.T

4

RM

S.T

2

RM

S.T

6

RM

S.T

7

RM

S.T

8RM

S.T

5RM

S.T

3

RM

S.T

10RM

S.T

11

TE

ST

.9

TE

ST

.11

TE

ST

.5

TE

ST

.8

TE

ST

.10

TE

ST

.13

TE

ST

.3

TE

ST

.1

TE

ST

.2

TE

ST

.4

TE

ST

.7

TE

ST

.12

TE

ST

.24

TE

ST

.6

TE

ST

.21

TE

ST

.20

TE

ST

.17T

ES

T.1

8

TE

ST

.22

TE

ST

.16 T

ES

T.2

3T

ES

T.1

4

TE

ST

.25

TE

ST

.15

TE

ST

.19

0.0

0.2

0.4

0.6

0.8

1.0

Clustering of experiments

Page 8: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of genes

1 2

3

4

5

6

7

8

9

10

11

1213

14

15 16

17

18

19

20

2122

23

24

25

26 27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48 49

50

51

52

53

54

55

56

57

58

59

60

61

62

6364

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

8182

83

84

85

86

87

88

89

90

91

92

93

94

95

96

0.0

0.2

0.4

0.6

0.8

1.0

Page 9: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The heat map

0 20 40 60 80 100

020

4060

80

-4-2

02

Page 10: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

temp.mds$points[, 1]

tem

p.m

ds$p

oint

s[, 2

]

-100 -50 0 50

-80

-60

-40

-20

020

4060

Stress = 21.06991

Page 11: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Observations and comments

• “Solution” doesn’t exactly agree with Khan’s– Comparing the output

• Heat maps similar• Plots of experiments similar

– Source of differences• Experimental noise• Scaling of “experiments

• Setting up the modelnhgri<-supplemental.data[, -c(1:2)]gene.list<-match(ann.genes[,2], supplemental.data[,1])

nhgri.small<-log(nhgri[gene.list,])nhgri.small<-apply(nhgri.small, 2, scale)

Page 12: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data
Page 13: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of experiments

EW

S.T

1

EW

S.T

2

EW

S.T

3

EW

S.T

4

EW

S.T

6

EW

S.T

7E

WS

.T9

EW

S.T

11E

WS

.T12

EW

S.T

13

EW

S.T

14E

WS

.T15

EW

S.T

19

EW

S.C

8

EW

S.C

3E

WS

.C2

EW

S.C

4

EW

S.C

6E

WS

.C9

EW

S.C

7

EW

S.C

1

EW

S.C

11E

WS

.C10

BL.

C5

BL.

C6

BL.

C7

BL.

C8

BL.

C1

BL.

C2

BL.

C3

BL.

C4

NB

.C1

NB

.C2

NB

.C3

NB

.C6

NB

.C12

NB

.C7

NB

.C4

NB

.C5

NB

.C10

NB

.C11

NB

.C9

NB

.C8

RM

S.C

4

RM

S.C

3R

MS

.C9

RM

S.C

2

RM

S.C

5

RM

S.C

6

RM

S.C

7 RM

S.C

8

RM

S.C

10 RM

S.C

11

RM

S.T

1

RM

S.T

4

RM

S.T

2

RM

S.T

6

RM

S.T

7

RM

S.T

8 RM

S.T

5 RM

S.T

3

RM

S.T

10 RM

S.T

11

TE

ST

.9

TE

ST

.11

TE

ST

.5

TE

ST

.8

TE

ST

.10

TE

ST

.13

TE

ST

.3

TE

ST

.1

TE

ST

.2

TE

ST

.4

TE

ST

.7

TE

ST

.12

TE

ST

.24

TE

ST

.6

TE

ST

.21

TE

ST

.20

TE

ST

.17 T

ES

T.1

8

TE

ST

.22

TE

ST

.16 T

ES

T.2

3

TE

ST

.14

TE

ST

.25

TE

ST

.15

TE

ST

.19

0.0

0.2

0.4

0.6

0.8

1.0

Page 14: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of genes

1 2

3

4

5

6

7

8

910

11

1213

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48 49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

666768

69

70

71

72

73

74

7576

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

0.0

0.2

0.4

0.6

0.8

1.0

Page 15: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The heat map

0 20 40 60 80 100

020

4060

80

-20

2

Page 16: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

temp.mds$points[, 1]

tem

p.m

ds$p

oint

s[, 2

]

-40 -20 0 20 40 60

-60

-40

-20

020

4060

Stress = 16.82979

Page 17: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Some thoughts on filtering data

• Khan’s objective– “..highly expressed genes”– Need to identify subsets within the data

• Tumor/cell type• Search data frame for differentially expression

– Criteria• Arbitrary

– Difference in expression exceeds threshold value» Means, medians, trimmed means

• Population based– Subsets within the data groupings– Scaled vs. unscaled

Page 18: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Experiment 1• Setup

nhgri<-supplemental.data[, -c(1:2)]nhgri<-log(nhgri)

• Candidate genesgene.list<-NULLgene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -

apply(nhgri[,bl], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -

apply(nhgri[,nb], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,ews], 1, median) -

apply(nhgri[,rms], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,bl], 1, median) -

apply(nhgri[,nb], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,bl], 1, median) -

apply(nhgri[,rms], 1, median))>2)gene.list<-c(gene.list, (apply(nhgri[,nb], 1, median) -

apply(nhgri[,rms], 1, median))>2)gene.list<-unique(names(gene.list)[gene.list==T])

nhgri.small<-nhgri[gene.list,]

Page 19: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of experiments

EW

S.T

1

EW

S.T

2E

WS

.T3

EW

S.T

4

EW

S.T

6

EW

S.T

7E

WS

.T9

EW

S.T

11E

WS

.T12

EW

S.T

13

EW

S.T

14

EW

S.T

15

EW

S.T

19

EW

S.C

8

EW

S.C

3E

WS

.C2

EW

S.C

4

EW

S.C

6E

WS

.C9

EW

S.C

7

EW

S.C

1

EW

S.C

11E

WS

.C10

BL.

C5

BL.

C6

BL.

C7

BL.

C8

BL.

C1

BL.

C2

BL.

C3

BL.

C4 NB

.C1 NB

.C2

NB

.C3

NB

.C6

NB

.C12

NB

.C7

NB

.C4

NB

.C5

NB

.C10

NB

.C11

NB

.C9

NB

.C8R

MS

.C4

RM

S.C

3R

MS

.C9

RM

S.C

2

RM

S.C

5

RM

S.C

6

RM

S.C

7

RM

S.C

8

RM

S.C

10

RM

S.C

11

RM

S.T

1

RM

S.T

4

RM

S.T

2

RM

S.T

6

RM

S.T

7

RM

S.T

8

RM

S.T

5

RM

S.T

3

RM

S.T

10

RM

S.T

11

TE

ST

.9

TE

ST

.11

TE

ST

.5

TE

ST

.8

TE

ST

.10

TE

ST

.13

TE

ST

.3

TE

ST

.1

TE

ST

.2

TE

ST

.4

TE

ST

.7 TE

ST

.12

TE

ST

.24

TE

ST

.6

TE

ST

.21

TE

ST

.20

TE

ST

.17T

ES

T.1

8

TE

ST

.22

TE

ST

.16

TE

ST

.23

TE

ST

.14

TE

ST

.25

TE

ST

.15 T

ES

T.1

9

0.0

0.2

0.4

0.6

0.8

1.0

Page 20: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of genes

1

23

4 5

6

7

8

9

10

11

12

13

14

15

16

17

18

19 20

21

22

23

24

25

2627

28

29

30

31

32

3334

35

36

3738

3940

41

4243

4445

46

4748

49

5051

0.0

0.2

0.4

0.6

0.8

1.0

Page 21: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The heat map

0 10 20 30 40 50

020

4060

80

-4-2

02

Page 22: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

temp.mds$points[, 1]

tem

p.m

ds$p

oint

s[, 2

]

-80 -60 -40 -20 0 20 40

-40

-20

020

40

Stress = 20.26463

Page 23: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Experiment 2• Setup

ews.mat<-matrix(0, nrow(nhgri), 8)bl.mat<-matrix(0, nrow(nhgri), 8)nb.mat<-matrix(0, nrow(nhgri), 8)rms.mat<-matrix(0, nrow(nhgri), 8)for(i in 1:nrow(nhgri))

ews.mat[i,]<-rev(sort(as.matrix(nhgri[i,ews])))[1:8] for(i in 1:nrow(nhgri))

bl.mat[i,]<-rev(sort(as.matrix(nhgri[i,bl])))[1:8] for(i in 1:nrow(nhgri))

nb.mat[i,]<-rev(sort(as.matrix(nhgri[i,nb])))[1:8] for(i in 1:nrow(nhgri))

rms.mat[i,]<-rev(sort(as.matrix(nhgri[i,rms])))[1:8]

Page 24: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Experiment 2

• Candidate genesgene.list<-NULLgene.list<-c(gene.list, apply(ews.mat, 1, median) -

apply(bl.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(ews.mat, 1, median) -

apply(nb.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(ews.mat, 1, median) -

apply(rms.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(bl.mat, 1, median) -

apply(nb.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(bl.mat, 1, median) -

apply(rms.mat, 1, median)>1.75)gene.list<-c(gene.list, apply(nb.mat, 1, median) -

apply(rms.mat, 1, median)>1.75)gene.list<-(unique(rep(1:nrow(nhgri),6)*gene.list))[-1]

nhgri.small<-nhgri[gene.list,]

Page 25: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of experiments

EW

S.T

1E

WS

.T2

EW

S.T

3

EW

S.T

4E

WS

.T6

EW

S.T

7

EW

S.T

9

EW

S.T

11E

WS

.T12

EW

S.T

13

EW

S.T

14E

WS

.T15

EW

S.T

19

EW

S.C

8

EW

S.C

3E

WS

.C2

EW

S.C

4

EW

S.C

6E

WS

.C9

EW

S.C

7

EW

S.C

1

EW

S.C

11E

WS

.C10BL.

C5

BL.

C6

BL.

C7

BL.

C8B

L.C

1B

L.C

2B

L.C

3B

L.C

4 NB

.C1

NB

.C2

NB

.C3

NB

.C6

NB

.C12

NB

.C7

NB

.C4

NB

.C5

NB

.C10

NB

.C11

NB

.C9

NB

.C8

RM

S.C

4

RM

S.C

3R

MS

.C9

RM

S.C

2

RM

S.C

5R

MS

.C6

RM

S.C

7

RM

S.C

8

RM

S.C

10

RM

S.C

11

RM

S.T

1

RM

S.T

4

RM

S.T

2 RM

S.T

6

RM

S.T

7

RM

S.T

8R

MS

.T5

RM

S.T

3

RM

S.T

10

RM

S.T

11

TE

ST

.9

TE

ST

.11T

ES

T.5

TE

ST

.8

TE

ST

.10

TE

ST

.13

TE

ST

.3

TE

ST

.1 TE

ST

.2

TE

ST

.4

TE

ST

.7

TE

ST

.12

TE

ST

.24

TE

ST

.6

TE

ST

.21

TE

ST

.20

TE

ST

.17

TE

ST

.18

TE

ST

.22

TE

ST

.16

TE

ST

.23

TE

ST

.14

TE

ST

.25TE

ST

.15

TE

ST

.19

0.0

0.2

0.4

0.6

0.8

Page 26: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of genes

33

84

88

10710

8

119

127

146

156

169

182

185

200

217

230

231

235

236

244

246

24724

8 251

256

257

266

281 32

6

354

368

380

412

43044

5

454

477

509

521

522

544

545

560 56

6

567

607

671

672

687

688

707

719

73178

8

800

819

831

847

850

851

905

933

951

973

1009

1066

1082 1085

1105

1124

1128

1159

1188

1208

1227

1235

1237

1246

1263

1281 1298

1319

1360

1389

1423

1427

1434

143714

86

1489

1494

1517

1524

1544

1570

1601

1608

1613

1626

1627

1645

1700

170817

111721 17

28

1736

1739

1750

1751

1766

177217

92

1795

1803

1808

1810

1828

1831

1834

1841

1886

1890

1954

1961

1975

1978

1980

1991

2022 20

32

2081

2099

2162

2166

2198

2199

2217

221922

23

2226

2231

2235

2253

2273

2290

2303

2304

145

151

34836

577

9

824

842

1915

2135

85

335

783

836

846

970 12

95

1916

1924

1884

1764

0.0

0.2

0.4

0.6

0.8

1.0

Page 27: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The heat map

0 50 100 150

020

4060

80

-4-2

02

Page 28: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

Stress = 68.52684

temp.mds$points[, 1]

tem

p.m

ds$p

oint

s[, 2

]

-200 -100 0 100

-50

050

100

Page 29: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Experiment 3

• Data are scalednhgri<-supplemental.data[, -c(1:2)]nhgri<-apply(log(nhgri),2,scale)

Page 30: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of experiments

EW

S.T

1E

WS

.T2

EW

S.T

3

EW

S.T

4

EW

S.T

6

EW

S.T

7

EW

S.T

9

EW

S.T

11

EW

S.T

12

EW

S.T

13

EW

S.T

14

EW

S.T

15

EW

S.T

19

EW

S.C

8

EW

S.C

3E

WS

.C2

EW

S.C

4

EW

S.C

6E

WS

.C9

EW

S.C

7

EW

S.C

1

EW

S.C

11E

WS

.C10

BL.

C5

BL.

C6

BL.

C7

BL.

C8B

L.C

1

BL.

C2

BL.

C3

BL.

C4

NB

.C1

NB

.C2

NB

.C3

NB

.C6

NB

.C12

NB

.C7

NB

.C4

NB

.C5

NB

.C10

NB

.C11

NB

.C9

NB

.C8 RM

S.C

4

RM

S.C

3R

MS

.C9

RM

S.C

2

RM

S.C

5

RM

S.C

6

RM

S.C

7

RM

S.C

8

RM

S.C

10

RM

S.C

11

RM

S.T

1

RM

S.T

4

RM

S.T

2 RM

S.T

6 RM

S.T

7

RM

S.T

8RM

S.T

5

RM

S.T

3

RM

S.T

10

RM

S.T

11

TE

ST

.9

TE

ST

.11

TE

ST

.5

TE

ST

.8

TE

ST

.10

TE

ST

.13

TE

ST

.3

TE

ST

.1

TE

ST

.2

TE

ST

.4

TE

ST

.7

TE

ST

.12

TE

ST

.24

TE

ST

.6

TE

ST

.21

TE

ST

.20

TE

ST

.17

TE

ST

.18

TE

ST

.22

TE

ST

.16 T

ES

T.2

3

TE

ST

.14

TE

ST

.25

TE

ST

.15

TE

ST

.19

0.0

0.2

0.4

0.6

0.8

Page 31: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering of genes

1

2

3

4

5

6

7

8

9

10 11

12

13

14

1516

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

34

35

36

37

3839

40

41

42

43 44

45

4647

48

49

50

51

52

53

54

55

56 57

58 59

60

61

62

63

64

65

66

67 68

69

70

71

72 73

74

75

76

77

78

79

80

81

82

8384

85

86 87

88

89

90

91

9293

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

0.0

0.2

0.4

0.6

0.8

1.0

Page 32: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

The heat map

0 20 40 60 80 100

020

4060

80

-4-2

02

4

Page 33: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Multidimensional scaling

Stress = 100.181

temp.mds$points[, 1]

tem

p.m

ds$p

oint

s[, 2

]

-200 -100 0 100

-50

050

100

Page 34: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Evaluating the output

common.genes1<-intersect(gene.list1, gene.list)unique.genes1<-setdiff(gene.list1, gene.list)unique.genes.1<-setdiff(gene.list, gene.list1)common.genes2<-intersect(gene.list2, gene.list)unique.genes2<-setdiff(gene.list2, gene.list)unique.genes.2<-setdiff(gene.list, gene.list2)common.genes3<-intersect(gene.list3, gene.list)unique.genes3<-setdiff(gene.list3, gene.list)unique.genes.3<-setdiff(gene.list, gene.list3)

supplemental.data[common.genes1, 1:2]supplemental.data[unique.genes1, 1:2]supplemental.data[unique.genes.1, 1:2]supplemental.data[common.genes2, 1:2]supplemental.data[unique.genes2, 1:2]supplemental.data[unique.genes.2, 1:2]supplemental.data[common.genes3, 1:2]supplemental.data[unique.genes3, 1:2]supplemental.data[unique.genes.3, 1:2]

Page 35: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Clustering the tumor/cell types

• Cutting the treesmodel.1 <- cutree(nhgri.small.clust, h=0.6)model.1.tree <- nhgri.small.clustmodel.2 <- cutree(nhgri.small.clust, h=0.6)model.2.tree <- nhgri.small.clustmodel.3 <- cutree(nhgri.small.clust, h=0.6)model.3.tree <- nhgri.small.clustmodel.k <- cutree(nhgri.small.clust, h=0.6)model.k.tree <- nhgri.small.clustmodel.ks <- cutree(nhgri.small.clust, h=0.6)model.ks.tree <- nhgri.small.clust

• Identifying the groupsNhgri.groups <- cbind(dimnames(nhgri)[[2]], model.1, model.2,

model.3, model.k,, model.ks)

Page 36: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Comparison of classifications

> table(nhgri.groups[, 4], nhgri.groups[, 1])1 2 3 4 5 6

1 19 5 0 0 0 010 1 0 0 0 0 02 0 1 15 0 0 03 1 1 0 16 7 04 1 0 0 3 0 05 2 0 0 0 0 06 0 0 0 0 0 117 0 0 0 2 0 08 0 2 0 0 0 09 0 1 0 0 0 0

Page 37: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Comparison of classifications

> table(nhgri.groups[, 4], nhgri.groups[, 2])1 10 11 12 2 3 4 5 6 7 8 9

1 16 0 1 0 0 1 0 6 0 0 0 010 0 0 0 0 0 0 0 1 0 0 0 02 0 0 0 0 0 0 1 0 15 0 0 03 0 0 0 0 14 3 0 1 0 7 0 04 0 1 0 0 3 0 0 0 0 0 0 05 1 0 0 0 0 1 0 0 0 0 0 06 0 0 0 0 0 0 0 0 0 0 11 07 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 2 0 0 0 0 09 0 0 0 1 0 0 0 0 0 0 0 0

Page 38: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Comparison of classifications

> table(nhgri.groups[, 4], nhgri.groups[, 3])1 10 2 3 4 5 6 7 8 9

1 0 0 0 16 0 7 0 0 0 110 0 0 0 0 0 1 0 0 0 02 16 0 0 0 0 0 0 0 0 03 0 0 15 0 9 0 1 0 0 04 0 0 3 0 0 0 0 0 1 05 0 0 0 1 0 0 1 0 0 06 0 0 0 0 0 0 0 11 0 07 0 0 2 0 0 0 0 0 0 08 2 0 0 0 0 0 0 0 0 09 0 1 0 0 0 0 0 0 0 0

Page 39: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Comparison of classifications

> table(nhgri.groups[, 4], nhgri.groups[, 5])1 2 3 4 5 6 7

1 0 0 0 1 23 0 010 0 0 0 0 0 0 12 0 0 16 0 0 0 03 1 24 0 0 0 0 04 3 0 0 0 1 0 05 0 0 0 0 2 0 06 0 0 0 0 0 11 07 2 0 0 0 0 0 08 0 0 2 0 0 0 09 0 0 0 1 0 0 0

Page 40: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Model 1

0 10 20 30 40 50

020

4060

80

-4-2

02

Page 41: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Model 2

0 50 100 150

020

4060

80

-4-2

02

Page 42: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Model 3

0 20 40 60 80 100

020

4060

80

-4-2

02

4

Page 43: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Khan’s solution

0 20 40 60 80 100

020

4060

80

-4-2

02

Page 44: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Khan’s solution, scaled data

0 20 40 60 80 100

020

4060

80

-20

2

Page 45: MMG991 Session 9 · 2001-11-09 · MMG991 Session 9 • Classical multidimensional scaling – Concepts – S-Plus implementation – Microarrays • Looking at Khan’s cancer data

Some closing thoughts…

• There remain some unexplained differences– Not related to a simple transformation– Impact of noise in data on ANN warrants further investigation

• Multiple “solutions” yield a smaller set of “diagnostic” genes– 40-60% overlap with Khan’s solution– Additional genes that were not reported– Need a cancer biologist to review the significance

• All models could be easily refined– Adjustment of clustering thresholds

• Iterative model