analysis of microarray data. gene expression database – a conceptual view samples genes gene...

70
Analysis of microarray data

Post on 19-Dec-2015

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Analysis of microarray data

Page 2: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Gene expression database – a conceptual view

SamplesG

enes

Gene expression levels

Sample annotations

Gene annotations

Gene expression matrix

Page 3: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

An Example

.4}3,4{max

.734

.5342 22

:distance sup"" 3,

:distance Manhattan 2,

:distance Euclidean 1,

4

3

x

y

Page 4: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Distance-based Clustering

• Assign a distance measure between data • Find a partition such that:

– Distance between objects within partition (i.e. same cluster) is minimized

– Distance between objects from different clusters is maximised

• Issues :– Requires defining a distance (similarity) measure in situation

where it is unclear how to assign it– What relative weighting to give to one attribute vs another?– Number of possible partition is super-exponential

Page 5: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Hierarchical Clustering Techniques

At the beginning, each object (gene) is a cluster. In each of the subsequent steps, two closest clusters will merge into one cluster until there is only one cluster left.

Page 6: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Hierarchical ClusteringGiven a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process hierarchical clustering is this:

1.Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.

2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3.Compute distances (similarities) between the new cluster and each of the old clusters.

4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Page 7: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

The distance between two clusters is defined as the distance between

• Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities

• Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities

• Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

• Their Centroids.• Average of all cross-cluster pairs.

Page 8: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Computing Distances• single-link clustering (also called the connectedness or minimum method) : we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster. If the data consist of similarities, we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from any member of one cluster to any member of the other cluster.

• complete-link clustering (also called the diameter or maximum method): we consider the distance between one cluster and another cluster to be equal to the longest distance from any member of one cluster to any member of

the other cluster.

• average-link clustering : we consider the distance between one cluster and another cluster to be equal to the average distance from any member of one cluster

to any member of the other cluster.

Page 9: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Single-Link Method

ba

453652

cba

dcb

Distance Matrix

Euclidean Distance

453,

cba

dc

453652

cba

dcb4,, cbad

(1) (2) (3)

a,b,ccc d

a,b

d da,b,c,d

Page 10: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Complete-Link Method

ba

453652

cba

dcb

Distance Matrix

Euclidean Distance

465,

cba

dc

453652

cba

dcb6,,

badc

(1) (2) (3)

a,b

cc d

a,b

d c,da,b,c,d

Page 11: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Compare Dendrograms

a b c d a b c d

2

4

6

0

Single-Link Complete-Link

Page 12: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Ordered dendrograms

2 n-1 linear orderings of n elements (n= # genes or conditions)

Maximizing adjacent similarity is impractical. So order by:•Average expression level, •Time of max induction, or•Chromosome positioning

Eisen98

Page 13: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Self organizing maps

Tamayo et al. 1999 PNAS 96:2907-2912

Page 14: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 15: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

1. centroide 2. centroide 3. centroide

4. centroide 5. centroide 6. centroide

k = 6

Page 16: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

k = 6

Page 17: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

k = 6

Page 18: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

k = 6

Page 19: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Partitioning vs. Hierarchical

• Partitioning– Advantage: Provides clusters that satisfy some

optimality criterion (approximately)– Disadvantages: Need initial K, long computation

time

• Hierarchical– Advantage: Fast computation (agglomerative)– Disadvantages: Rigid, cannot correct later for

erroneous decisions made earlier

Page 20: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Generic Clustering Tasks

• Estimating number of clusters

• Assigning each object to a cluster

• Assessing strength/confidence of cluster assignments for individual objects

• Assessing cluster homogeneity

Page 21: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Clustering and promoter elements

Harmer et al. 2000 Science 290:2110-2113

Page 22: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

An Example Cluster

(DeRisi et al, 1997)

Page 23: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Cluster of co-expressed genes, pattern discovery in regulatory regions

600 basepairs

Expression profiles

Upstream regions

Retrieve

Pattern over-represented in cluster

Page 24: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Some Discovered PatternsPattern Probability Cluster No. TotalACGCG 6.41E-39 96 75 1088ACGCGT 5.23E-38 94 52 387CCTCGACTAA 5.43E-38 27 18 23GACGCG 7.89E-31 86 40 284TTTCGAAACTTACAAAAAT 2.08E-29 26 14 18TTCTTGTCAAAAAGC 2.08E-29 26 14 18ACATACTATTGTTAAT 3.81E-28 22 13 18GATGAGATG 5.60E-28 68 24 83TGTTTATATTGATGGA 1.90E-27 24 13 18GATGGATTTCTTGTCAAAA 5.04E-27 18 12 18TATAAATAGAGC 1.51E-26 27 13 18GATTTCTTGTCAAA 3.40E-26 20 12 18GATGGATTTCTTG 3.40E-26 20 12 18GGTGGCAA 4.18E-26 40 20 96TTCTTGTCAAAAAGCA 5.10E-26 29 13 18

Vilo et al. 2001

Page 25: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Jaak Vilo

The "GGTGGCAA" Cluster ORF Gene Description

YBL041W PRE7 20S proteasome subunit(beta6) YBR170C NPL4 nuclear protein localization factor and ER translocation component YDL126C CDC48 microsomal protein of CDC48/PAS1/SEC18 family of ATPases YDL100C similarity to E.coli arsenical pump-driving ATPase YDL097C RPN6 subunit of the regulatory particle of the proteasome YDR313C PIB phosphatidylinositol(3)-phosphate binding protein YDR330W similarity to hypothetical S. pombe protein YDR394W RPT3 26S proteasome regulatory subunit YDR427W RPN9 subunit of the regulatory particle of the proteasome YDR510W SMT3 ubiquitin-like protein YER012W PRE1 20S proteasome subunit C11(beta4) YFR004W RPN11 26S proteasome regulatory subunit YFR033C QCR6 ubiquinol--cytochrome-c reductase 17K protein YFR050C PRE4 20S proteasome subunit(beta7) YFR052W RPN12 26S proteasome regulatory subunit YGL048C RPT6 26S proteasome regulatory subunit YGL036W MTC2 Mtf1 Two hybrid Clone 2 YGL011C SCL1 20S proteasome subunit YC7ALPHA/Y8 (alpha1) YGR048W UFD1 ubiquitin fusion degradation protein YGR135W PRE9 20S proteasome subunit Y13 (alpha3) YGR253C PUP2 20S proteasome subunit(alpha5) YIL075C RPN2 26S proteasome regulatory subunit YJL102W MEF2 translation elongation factor, mitochondrial YJL053W PEP8 vacuolar protein sorting/targeting protein YJL036W weak similarity to Mvp1p YJL001W PRE3 20S proteasome subunit (beta1) YJR117W STE24 zinc metallo-protease YKL145W RPT1 26S proteasome regulatory subunit YKL117W SBA1 Hsp90 (Ninety) Associated Co-chaperone YLR387C similarity to YBR267w YMR314W PRE5 20S proteasome subunit(alpha6) YOL038W PRE6 20S proteasome subunit (alpha4) YOR117W RPT5 26S proteasome regulatory subunit YOR157C PUP1 20S proteasome subunit (beta2) YOR176W HEM15 ferrochelatase precursor YOR259C RPT4 26S proteasome regulatory subunit YOR317W FAA1 long-chain-fatty-acid--CoA ligase YOR362C PRE10 20S proteasome subunit C1 (alpha7) YPR103W PRE2 20S proteasome subunit (beta5) YPR108W RPN7 subunit of the regulatory particle of the proteasome

Page 26: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Two sided clustering

Alizadeh et al. 2000 Nature 403:505-5011

Page 27: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Diffuse large B-cell lymphoma

Page 28: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 29: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 30: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 31: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Neighborhood analysis

Golub et al 2002

Page 32: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Acute Leukemias

• acute lymphoblastic leukemia, ALL• acute myeloid leukemia, AML

– Not distinguishable, but different clinical outcome

Page 33: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Neighborhood analysis

Class predictor

Page 34: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 35: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 36: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Regulatory pathway reconstruction

Ideker et al Science 2001

Page 37: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 38: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 39: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 40: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 41: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 42: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Chromatin IP Chip (ChIP-chip)

Iver et al. 2000

Page 43: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 44: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 45: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Protein Function Prediction

Jensen et al 2002

Page 46: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

NetOGlyc,NetPhos,PEST regions,PSIPRED,SEG filter,SignalP,PSORT,TMHMM.

Page 47: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 48: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 49: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Protein Function Prediction II

Marcotte & Eisenberg 1999

Page 50: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 51: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 52: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 53: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 54: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 55: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 56: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 57: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 58: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Biochemical pathways

Dandekar et al 1999

Page 59: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 60: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Standard resolution | High resolution

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

                                                                                                                                                                                                                                                                                 

Figure 1 Pathway alignment for glycolysis, Entner–Doudoroff pathway and pyruvate processingEnzymes for each pathway part (top; EC numbers and enzyme subunits are given below these) are compared in 17 organisms and represented as small rectangles. Filled and empty rectangles indicate the presence and absence respectively of enzyme-encoding genes in the different species listed at the left. Further details are given in the text; different isoenzymes and enzyme families are listed in Table 2.

Page 61: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Flux balance analysis

Edwards et al 2000

Page 62: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 63: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 64: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 65: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 66: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 67: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene

Comparative genome Comparative genome analysisanalysis

Page 68: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 69: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene
Page 70: Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene