ming-chih kao, phd university of michigan medical school [email protected]

29
Integrating Cross-Platform Microarray Data by Second- order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of Michigan Medical School [email protected]

Upload: ina

Post on 08-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction. Ming-Chih Kao, PhD University of Michigan Medical School [email protected]. Wing Hung Wong Professor of Statistics and of Health Research and Policy Stanford University. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction

Ming-Chih Kao, PhD

University of Michigan Medical School

[email protected]

Page 2: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

Xianghong Jasmine Zhou

Assistant Professor of Biological Sciences

USC

Wing Hung Wong

Professor of Statistics and of Health Research and Policy

Stanford University

Page 3: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisCurrent Challenges in Microarray Data Analysis1. How to effectively combine the expression

data sets generated with different technology/laboratory platforms?

2. How to identify functionally related genes without co-expression pattern?

3. How to identify transcription cascades?

Page 4: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

MicroarrayPlatforms

2nd-Order AnalysisMultiple Microarray Technology Platforms

Page 5: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisPublic Microarray Data Sources

Experiments Datasets

S. cerevisiae 788 61

C. elegans 348 15

A. thaliana 736 44

M. mus 1,553 20

H. sapiens 4,135 90

Page 6: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

TranscriptionFactor 1

TranscriptionFactor 3

TranscriptionFactor 2

gene1

gene2

gene3

gene5

gene4

gene6

gene7

Amplification of signal

?

?

Page 7: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich
Page 8: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

G1

G2

G3

G4

experiments

expression

Cell Cycle Stress Osmotic Starvation Copper Zinc

Experimental groups

Experimental groups

exp. correlation

exp. correlation

First-order correlation

Second-order Correlation

Page 9: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

ChromatinSilencing

Amino acidStarvation

GammaRadiation

ProteinMetabolism

DNADamage

HeatSteady

Ex

pre

ss

ion

o

f S

DA

1-C

DC

5

Ex

pre

ss

ion

C

orr

ela

tio

nP

OG

1-M

PT

5,

SD

A1

-CD

C5

Ex

pre

ss

ion

of

PO

G1

-MP

T5

Experimental groups

Regulation of Cell Cycle: POG1-MPT5 and SDA1-CDC5

2nd-Order AnalysisAn Example

Page 10: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

Group functionally related genes that may not exhibit similar expression patterns?

Data Stanford Microarray Database (cDNA array) NCBI GEO Database (Affymetrix array) Rosetta Compendium (cDNA array)

39 experimental groups subjected to different (types) of perturbations, such as cell cycle, heat shock, osmotic pressure, starvation, zinc, nitrogen depletion, etc.

2nd-Order AnalysisValidation

Page 11: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

43 functional classes

2,429 genes

5,142doublets

278,799 Quadruplets

Homogenous Quadruplets

84%

HeterogeneousQuadruplets

16%

2nd-Order AnalysisValidation: Scheme

Page 12: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisValidation: Comparison

Page 13: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisValidation: Results 2nd-order analysis

groups functionally related genes The derived quadruplets

give rise to a set of 2,597 distinct and novel gene pairs

97% of the 2,597 pairs are missed by the standard methods

Reasons for the poor performance of the 1st-order method Inter-dataset variations Cross-doublet gene pairs

need not show high expression correlation

Sensitivity to gene pairs which are only co-expressed in a subset of the data sets

Page 14: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

c

a

b

d

e

f

5

Cell Cycle

c

a

b

d

e

f

5

Heat shock Starvation

c

a

b

d

e

f

5

Nitrogen Depletion

c

a

b

d

e

f

5

c

a

b

d

e

f

5

Radiation Osmotic pressure

c

a

b

d

e

f

5

Page 15: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisInteraction Modules

Page 16: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisInteraction Modules

Page 17: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisInteraction Modules: Leave-one-out Cross Validation For each gene occurred in the 100 tightest

and most stable clusters of known genes, we masked its function and make prediction based on our 2-step procedure, and check the predicted function and its true function.

We made predictions for 179 doublets, among which 163 are correct

91% success ratio

Page 18: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisInteraction Modules: Functional Prediction 79 functions of 69 unknown yeast genes

involved in diverse biological processes Experimental studies in the literature and in

our laboratory YLR183C in “mitosis”

Regulation of G1/S transition YLL051C in “cation transport”

Ferric-chelate reductase activity and iron-regulated expression

Page 19: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisFrequently Occurring Tight Clusters

Transcription Factors

Page 20: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisFrequently Occurring TCs with 2nd-Order Correlation

Page 21: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

Transcription Factors Set 1

Transcription Factor Set 2

Cooperativity

Page 22: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

3 types of transcription cascades

Page 23: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisChIP-Chip

Page 24: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisTranscription Module Results 60 transcription modules identified 34 pairs showed high 2nd-order correlation 29% (P<10-5) of those modules pairs are participants

in transcription cascades 2 pairs in Type I cascades 8 pairs in Type II cascades 3 pairs in Type III cascades

These transcription cascades inter-connect into a partial cellular regulatory network

Page 25: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich
Page 26: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

Avg

. E

xpre

ssi

on

Le

u3

mo

du

le v

s.

Me

t4 m

od

ule

Avg

. E

xpre

ssi

on

C

orr

ela

tio

nL

eu

3 m

od

ule

vs

. M

et4

mo

du

le

1.0

-1.0

1.0

-1.0

2nd-Order AnalysisLeu3 and Met4 Transcription Cascade

Page 27: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisHierarchical clustering of transcriptional modules

Page 28: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisAssigning transcription factor to pathwaysFor an unknown transcription factor in a module cluster, we can annotate its function by integrating 2 types of evidence:

the functions of known genes in its target module

the functions of known transcription factors regulating other modules in the same cluster

Page 29: Ming-Chih Kao, PhD University of Michigan Medical School mckao@med.umich

2nd-Order AnalysisSummaryA framework to integrate many microarray data sets in a platform-independent way, and investigated its properties and applications.

Group together functionally-related genes without direct expression similarity

Cluster the functional interaction into modules and functional annotation for unknown genes

Reveal the cooperativity in the regulatory network and reconstruct transcription cascades