brad windle, ph.d. 628-1956 bwindle@hsc.vcu.edu unsupervised learning and microarrays web site:...

Post on 18-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Brad Windle, Ph.D.628-1956bwindle@hsc.vcu.edu

Unsupervised Learningand Microarrays

Web Site: http://www.people.vcu.edu/~bwindleLink to Courses and then lecture for this class

Gene Expression Profiling

Unsupervised Learning

Cluster Analysisand

Applications

Good review of microarray data analysis isComputational analysis of microarray data.Quackenbush J. Nat Rev Genet 2001 Jun;2(6):418-427

Reductionism versus Systems Approach

Why generate global analyses?

as opposed to picking a gene/protein and hoping you get lucky and it has great significance to the big picture or to mankind’s health.

Normalizing Data

Northern blot

For normalizing samples, you would divide experimental values bythe mean of the values thought to be constant through the samples

Sample values are typically normalized by dividing by the meanof the reference values or mean of all values

What about normalizing gene values across all the samples?

100

10

Rationale for normalizing samples does not apply to genes

One strategy is to subtract the mean (mean centering).

Log transformation

.01 1 10 100//

-2 0 2

Gene to Gene Variability

Cluster Analysis

Goal - puts items (genes) together in clusters based on similarity of expression across various conditions, either similarity of absolute expression levels or overall similarity in pattern

1

2

34

1

2

34

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2

QuickTime™ and aAnimation decompressor

are needed to see this picture.

1

2

34

d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

1 2 3 4

0 .28 1.75 4.56

.28 0 1.91 4.48

1.75 1.91 0 3.71

4.56 4.48 3.71 0

1

2

3

4

r =n(∑XY) -(∑X)(∑Y)

[n∑X2-(∑X)2][n∑Y2-(∑Y)2]Pearson

1.00 -0.19 0.22 -0.04

-0.19 1.00 0.92 -0.97

0.22 0.92 1.00 -0.98

-0.04 -0.97 -0.98 1.00

1

2

3

4

1 2 3 41 2 3 4

1

2

3

4

0.00 1.19 0.78 1.04

1.19 0.00 0.08 1.97

0.78 0.08 0.00 1.98

1.04 1.97 1.98 0.00

d= 1-r 0 to 2

r= -1 to +1d= 1-|r| 0 to 1

d= 1-r2 0 to 1

Item 1

Item 2

Item 3

Item 4

Item 5

Item 6

Item 7

1

2

3

4

Hierarchical Clustering

Divisive Agglomerative(Aggregative)

Clustering Methods

A

B

C

D

.1

.12

.15

.15

.6

.6

A

B

C

D

.1

.12

.2

.3

.2

.6

Cluster Linkage Methods

Nearest Neighboror Single Linkage

Furthest Neighboror Complete Linkage

Average Neighborsor Average Linkage

2N-1

item X Y Z

1 1 1.5 1

2 1.2 1.3 1.5

3 1.4 3.2 4.0

4 5.1 3.5 2.1

X Y Z

12

3

1 2 3

K-Means Clustering and it’s relative Self-Organizing Maps (SOM)

12

3

1

2

3

0 10

0

5

10

15

0 5 10 15

Ranking Order Clustering

Cluster Playground 3

Applications of Gene Expression Profiling andCluster Analysis

Tissue or Tumor Classification

Gene Classification

Drug Classification

Drug Target Identification

B-Cell LymphomaNATURE 403, 503-511, 2000

Indistinguishable by histology

Yet half responded well to therapy and half did not

Where there differences in gene expression that correlate with drug response?

Gene expression profiles showed half the lymphomas were of GC B-Cell lineage and the other of Activated B-Cell lineage

A subset of genes predicts therapeutic outcome

M1 M2 M3 M4 M5 M6

M7 M8 M9 M10M11M12

M13M14M15M16M17M18

D1 D2 D3 D4 D5 D6

D7 D8 D9 D100D11D12

D13D14D15D16D17D18

Gene Expression Profiling of Yeast Mutants and DrugsCell 102, 109–126, 2000

Mutants Drugs

M4 D17

Erg2 Dyclonine

Human sigma receptor

Validation of cdc28 Kinase Target InhibitionSCIENCE 281, 533-538, 1998

cdc28-

D1 D2

} Cdc28-regulated genes

} Phosphate metabolism genes

Nucleotide analogs that block cdc28pD1 and D2

Pho85

Drug 12345

CellsA B C D E

-2 -1 0 -1 .01 1 -1.5 2 0 -.5 .4 0 1 1 .2 0 .7 2 1 .9 1 0 -.5 .5 -.8

COMPAREClustering Drugs Based on Cell Line Sensitivities

Nature Genetics 24: 236-244, 2000

T1T1T1T1T1T2T2A7A7T2A7A7A7A7A7A7A7T1T1T1T1T1

ProfilingGene

Expression

ProteinExpression

MiscData

SNPs

Methylation

DrugStructure

ProteinStructure

Cell State

Disease Drug Response

MetaboliticsStructuralGenomic

Clustering NCI 60 Cancer Cell LinesNature Genetics 24: 227-238

6165 Genes

9 Types of Tissues/Tumors

BreastCNSColonLeukemiaLungMelanomaOvarianProstateRenal

Filtering Data

Filter out data with the program Cluster, based on SD cuts

top related