ming-chih kao, phd university of michigan medical school [email protected]
DESCRIPTION
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction. Ming-Chih Kao, PhD University of Michigan Medical School [email protected]. Wing Hung Wong Professor of Statistics and of Health Research and Policy Stanford University. - PowerPoint PPT PresentationTRANSCRIPT
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction
Ming-Chih Kao, PhD
University of Michigan Medical School
Xianghong Jasmine Zhou
Assistant Professor of Biological Sciences
USC
Wing Hung Wong
Professor of Statistics and of Health Research and Policy
Stanford University
2nd-Order AnalysisCurrent Challenges in Microarray Data Analysis1. How to effectively combine the expression
data sets generated with different technology/laboratory platforms?
2. How to identify functionally related genes without co-expression pattern?
3. How to identify transcription cascades?
MicroarrayPlatforms
2nd-Order AnalysisMultiple Microarray Technology Platforms
2nd-Order AnalysisPublic Microarray Data Sources
Experiments Datasets
S. cerevisiae 788 61
C. elegans 348 15
A. thaliana 736 44
M. mus 1,553 20
H. sapiens 4,135 90
TranscriptionFactor 1
TranscriptionFactor 3
TranscriptionFactor 2
gene1
gene2
gene3
gene5
gene4
gene6
gene7
Amplification of signal
?
?
G1
G2
G3
G4
experiments
expression
Cell Cycle Stress Osmotic Starvation Copper Zinc
Experimental groups
Experimental groups
exp. correlation
exp. correlation
First-order correlation
Second-order Correlation
ChromatinSilencing
Amino acidStarvation
GammaRadiation
ProteinMetabolism
DNADamage
HeatSteady
Ex
pre
ss
ion
o
f S
DA
1-C
DC
5
Ex
pre
ss
ion
C
orr
ela
tio
nP
OG
1-M
PT
5,
SD
A1
-CD
C5
Ex
pre
ss
ion
of
PO
G1
-MP
T5
Experimental groups
Regulation of Cell Cycle: POG1-MPT5 and SDA1-CDC5
2nd-Order AnalysisAn Example
Group functionally related genes that may not exhibit similar expression patterns?
Data Stanford Microarray Database (cDNA array) NCBI GEO Database (Affymetrix array) Rosetta Compendium (cDNA array)
39 experimental groups subjected to different (types) of perturbations, such as cell cycle, heat shock, osmotic pressure, starvation, zinc, nitrogen depletion, etc.
2nd-Order AnalysisValidation
43 functional classes
2,429 genes
5,142doublets
278,799 Quadruplets
Homogenous Quadruplets
84%
HeterogeneousQuadruplets
16%
2nd-Order AnalysisValidation: Scheme
2nd-Order AnalysisValidation: Comparison
2nd-Order AnalysisValidation: Results 2nd-order analysis
groups functionally related genes The derived quadruplets
give rise to a set of 2,597 distinct and novel gene pairs
97% of the 2,597 pairs are missed by the standard methods
Reasons for the poor performance of the 1st-order method Inter-dataset variations Cross-doublet gene pairs
need not show high expression correlation
Sensitivity to gene pairs which are only co-expressed in a subset of the data sets
c
a
b
d
e
f
5
Cell Cycle
c
a
b
d
e
f
5
Heat shock Starvation
c
a
b
d
e
f
5
Nitrogen Depletion
c
a
b
d
e
f
5
c
a
b
d
e
f
5
Radiation Osmotic pressure
c
a
b
d
e
f
5
2nd-Order AnalysisInteraction Modules
2nd-Order AnalysisInteraction Modules
2nd-Order AnalysisInteraction Modules: Leave-one-out Cross Validation For each gene occurred in the 100 tightest
and most stable clusters of known genes, we masked its function and make prediction based on our 2-step procedure, and check the predicted function and its true function.
We made predictions for 179 doublets, among which 163 are correct
91% success ratio
2nd-Order AnalysisInteraction Modules: Functional Prediction 79 functions of 69 unknown yeast genes
involved in diverse biological processes Experimental studies in the literature and in
our laboratory YLR183C in “mitosis”
Regulation of G1/S transition YLL051C in “cation transport”
Ferric-chelate reductase activity and iron-regulated expression
2nd-Order AnalysisFrequently Occurring Tight Clusters
Transcription Factors
2nd-Order AnalysisFrequently Occurring TCs with 2nd-Order Correlation
Transcription Factors Set 1
Transcription Factor Set 2
Cooperativity
3 types of transcription cascades
2nd-Order AnalysisChIP-Chip
2nd-Order AnalysisTranscription Module Results 60 transcription modules identified 34 pairs showed high 2nd-order correlation 29% (P<10-5) of those modules pairs are participants
in transcription cascades 2 pairs in Type I cascades 8 pairs in Type II cascades 3 pairs in Type III cascades
These transcription cascades inter-connect into a partial cellular regulatory network
Avg
. E
xpre
ssi
on
Le
u3
mo
du
le v
s.
Me
t4 m
od
ule
Avg
. E
xpre
ssi
on
C
orr
ela
tio
nL
eu
3 m
od
ule
vs
. M
et4
mo
du
le
1.0
-1.0
1.0
-1.0
2nd-Order AnalysisLeu3 and Met4 Transcription Cascade
2nd-Order AnalysisHierarchical clustering of transcriptional modules
2nd-Order AnalysisAssigning transcription factor to pathwaysFor an unknown transcription factor in a module cluster, we can annotate its function by integrating 2 types of evidence:
the functions of known genes in its target module
the functions of known transcription factors regulating other modules in the same cluster
2nd-Order AnalysisSummaryA framework to integrate many microarray data sets in a platform-independent way, and investigated its properties and applications.
Group together functionally-related genes without direct expression similarity
Cluster the functional interaction into modules and functional annotation for unknown genes
Reveal the cooperativity in the regulatory network and reconstruct transcription cascades