making sense of complicated microarray data part ii gene clustering and data analysis

29
Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides

Upload: erica

Post on 19-Mar-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis. Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides. Why Cluster?. Clustering is a process by which you can explore your data in an efficient manner. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Making Sense of Complicated Microarray Data

Part II Gene Clustering and Data AnalysisGabriel EichlerBoston UniversitySome slides adapted from: MeV documentation slides

Page 2: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Why Cluster?

Clustering is a process by which you can explore your data in an efficient manner.

Visualization of data can help you review the data quality.

Assumption: Guilt by association – similar gene expression patterns may indicate a biological relationship.

Page 3: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Expression VectorsGene Expression Vectors encapsulate the

expression of a gene over a set of experimental conditions or sample types.

-0.8 0.8 1.5 1.8 0.5 -1.3 -0.4 1.5

-2

0

2

1 2 3 4 5 6 7 8Line Graph

-2 2

Numeric Vector

Heatmap

Page 4: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Expression Vectors As Points in ‘Expression Space’

Experiment 1

Experiment 2

Experiment 3

Similar Expression

-0.8

-0.60.9 1.2

-0.3

1.3

-0.7t 1 t 2 t 3

G1

G2

G3

G4

G5

-0.4-0.4

-0.8-0.8

-0.7

1.3 0.9 -0.6

Page 5: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Distance and Similarity -the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms

-distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression

-selection of a distance metric defines the concept of distance

Page 6: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Distance: a measure of similarity between gene expression.

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6

Gene A

Gene B

x1A x2A x3A x4A x5A x6A

x1B x2B x3B x4B x5B x6B

Some distances: (MeV provides 11 metrics)

1. Euclidean: i = 1 (xiA - xiB)26

2. Manhattan: i = 1 |xiA – xiB|6

3. Pearson correlation

p0

p1

Page 7: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Clustering Algorithms

Page 8: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Clustering Algorithms

Be weary - confounding computational artifacts are associated with all clustering algorithms. -You should always understand the basic concepts behind an algorithm before using it.

Anything will cluster! Garbage In means Garbage Out.

Page 9: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

(HCL-1)

• IDEA: Iteratively combines genes into groups based on similar patterns of observed expression

• By combining genes with genes OR genes with groups algorithm produces a dendrogram of the hierarchy of relationships.

• Display the data as a heatmap and dendrogram

• Cluster genes, samples or both

Page 10: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 11: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 12: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical ClusteringGene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 13: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 14: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 15: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 16: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 17: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

Gene 8

Page 18: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

H L

Page 19: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

The Leaf Ordering Problem:• Find ‘optimal’ layout of branches for a given dendrogram architecture• 2N-1 possible orderings of the branches• For a small microarray dataset of 500 genes there are 1.6*E150 branch configurations

SamplesG

enes

Page 20: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical ClusteringThe Leaf Ordering Problem:

Page 21: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Hierarchical Clustering

Pros:– Commonly used algorithm– Simple and quick to calculate

Cons:– Real genes probably do not have a

hierarchical organization

Page 22: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Self-Organizing Maps (SOMs)

a dbc

Idea: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares.

A

D

B

C

Page 23: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Self-Organizing Maps (SOMs)

a dbc

IDEA: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares.

A

D

B

C

Page 24: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6Gene 7Gene 8Gene 9Gene 10-Gene 11Gene 12Gene 13Gene 14Gene 15Gene 16

a_1hr a_2hr a_3hr b_1hr b_2hr b_3hr1 2 4 5 7 92 3 7 7 6 34 4 5 5 4 43 4 3 4 3 31 2 3 4 5 68 7 7 6 5 34 4 4 4 5 45 6 5 4 3 23 3 1 3 6 82 4 8 5 4 21 5 6 9 8 71 3 5 8 8 64 3 3 4 5 69 7 5 3 2 11 2 2 3 4 41 2 5 7 8 9

A

B

C

D

E

F

G

H

I

A

B

C

D

E

F

G

H

I

A

B

C

D

E

F

G

H

I

A

B

C

D

E

F

G

H

I A

B

C

D

E

F

G

H

I

Self-organizing Maps (SOMs)

Page 25: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Self-organizing Maps (SOMS)

A

B

C

D

E

F

G

H

I

Genes , , and1 16 5

Genes and 6 14Genes and 9 13

Genes and 4, 7 2Genes 3

Gene 15 Genes 8

Genes 10

Genes and 11 12

Page 26: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

G en e s

The Gene Expression Dynamics Inspector – GEDI

Group A

Group B

Group C

1.5 1.4 1.7 1.2 .85 .65 .50 .55 2.5 2.8 2.7 2.1

.78 .95 .75 .45 1.1 1.2 1.0 1.3 .56 .62 .78 .89

.45 .23 .15 .05 .82 .71 .62 .49 .11 .16 .11 .95

2.2 4.5 6.7 6.2 2.2 2.5 2.8 2.9 .48 .90 1.5 1.8

2.1 2.0 1.9 1.6 4.2 4.8 5.2 5.5 2.5 2.6 2.0 1.9

1.2 1.1 1.6 2.9 1.1 1.8 1.9 1.4 1.7 1.2 1.1 1.6

Gene 1Gene 2Gene 3Gene 4Gene 5Gene 6

Group A

A1 A2 A3 A4 B1 B2 B3 B4 C1 C2

Group B Group C

C3 C4} } }S a m p l e s

G en e s

1 2 3 4

H

L

Grou

p A

Grou

p B

Grou

p C

GEDI’s Features:•Allows for simultaneous analysis or several time courses or datasets

•Displays the data in an intuitive and comparable mathematically driven visualization

•The same genes maps to the same tiles

Page 27: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Software Demonstrations

MeV available at http://www.tigr.org/software/tm4/mev.html

GEDI available at http://www.chip.org/~ge/gedihome.htm

Page 28: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Comparison of GEDI vs. Hierarchical ClusteringHierarchical clustering of random data

(GIGO)

From: CreateGEP_Journal.wpd, random_A

G.E.D.I. allows the direct visual assessment of the quality of conventional cluster analysis

Page 29: Making Sense of Complicated Microarray Data Part II  Gene Clustering and Data Analysis

Questions