emergent biology through integration and mining of microarray datasets

32
ergent Biology Through Integration and Mini Of Microarray Datasets Lance D. Miller GIS Microarray & Expression Genomics

Upload: aldon

Post on 11-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Emergent Biology Through Integration and Mining Of Microarray Datasets . Lance D. Miller GIS Microarray & Expression Genomics. FOCUS:. Mining of expression data to understand the molecular composition of human cancers and to define components of the tumor molecular profile - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Emergent Biology Through Integration and Mining Of Microarray Datasets

Emergent Biology Through Integration and MiningOf Microarray Datasets

Lance D. MillerGIS Microarray & Expression Genomics

Page 2: Emergent Biology Through Integration and Mining Of Microarray Datasets

Mining of expression data to understandthe molecular composition of human

cancers and to define componentsof the tumor molecular profile

with mechanistic and clinical importance.

FOCUS:

Page 3: Emergent Biology Through Integration and Mining Of Microarray Datasets

2001, PNAS

Page 4: Emergent Biology Through Integration and Mining Of Microarray Datasets

Molecular classes are predictive of outcome

overall survival: relapse-free survival:

Page 5: Emergent Biology Through Integration and Mining Of Microarray Datasets

70-gene prognosis classifier for predicting riskof distant metastasis within 5 years

Van’t veer, et. al.

Page 6: Emergent Biology Through Integration and Mining Of Microarray Datasets

Van’t veer, et. al.

Page 7: Emergent Biology Through Integration and Mining Of Microarray Datasets

Sotiriou, et. al.

Page 8: Emergent Biology Through Integration and Mining Of Microarray Datasets

Though each tumor is molecularly unique,there exist common transcriptional cassettesthat underly biological and clinical propertiesof tumors that may be of diagnostic,prognostic and therapeutic significance.

Page 9: Emergent Biology Through Integration and Mining Of Microarray Datasets

GOAL:

Mining of expression data to understandthe molecular composition of human

cancers and to define componentsof the tumor molecular profile

with mechanistic and clinical importance.

Page 10: Emergent Biology Through Integration and Mining Of Microarray Datasets

The GIS Perpetual Array Platform

Page 11: Emergent Biology Through Integration and Mining Of Microarray Datasets

Integration of Independent DatasetsPerou et. al., 1999 Sorlie et. al., 2001 West et. al., 2001

Page 12: Emergent Biology Through Integration and Mining Of Microarray Datasets

Meta-Analysis of Breast Cancer Datasets:

dataset source sample size array format

1. Miller-Liu: unpublished 61 tumors: 39 ER+, 22 ER- 19K spotted oligo

2. Sotiriou-Liu: submitted: PNAS 99 tumors: 34 ER+, 65 ER- 7.6K spotted cDNA

3. Gruvberger-Meltzer: Cancer Research 47 tumors: 23 ER+, 24 ER- 6.7K spotted cDNA

4. Sorlie-Borrensen-Dale: PNAS 74 tumors: 56 ER+, 18 ER- 8.1K spotted cDNA

5. van’t Veer-Friend: Nature 98 tumors: 59 ER+, 39 ER- 25K spotted oligo

6. West-Nevins: PNAS 49 tumors: 25 ER+, 24 ER- 7.1K Affymetrix

total: 428 tumors, ~73,500 probes

(Adaikalavan Ramasamy et. al.)

Page 13: Emergent Biology Through Integration and Mining Of Microarray Datasets

META MADB: The Construct

1. Extract and Format the Data 2. Link sample/probe info via unique keys3. Log Transform and Normalize4. Filter Genes and Arrays5. Apply Statistical Tests

Building the Matrix

Creating a Universe1. Apply UniGene ID as Unifying Key2. Remove Gene Redundancy 3. Extract p values, d values, z-scores4. Set p value threshold5. Merge Datasets

Page 14: Emergent Biology Through Integration and Mining Of Microarray Datasets

META MADB

Page 15: Emergent Biology Through Integration and Mining Of Microarray Datasets

META MADB

Page 16: Emergent Biology Through Integration and Mining Of Microarray Datasets
Page 17: Emergent Biology Through Integration and Mining Of Microarray Datasets

d values (difference of average expression)

T1 T2 T3 T4 T5 …Tn T1 T2 T3 T4 T5 …Tn

gene1 : e1 e2 e3 e4 e5 …en e1 e2 e3 e4 e5 …en

d = average e [ER+] average e [ER-]/

ER+ ER-

Page 18: Emergent Biology Through Integration and Mining Of Microarray Datasets
Page 19: Emergent Biology Through Integration and Mining Of Microarray Datasets
Page 20: Emergent Biology Through Integration and Mining Of Microarray Datasets
Page 21: Emergent Biology Through Integration and Mining Of Microarray Datasets

Identifying Grade-Specific Genesin Hepatocellular Carcinoma

• Sample: 10 cases of each class• Sample collection: HBV(+)• Array: Human 19K Oligonucleotide array• Analysis : 50 arrays

OAH AAH G1 G2 G3

HCC Progression

Pre-neoplastic lesions

Adenomatous hyperplasiaordinary atypical

HCC Grade 1, 2, 3

Page 22: Emergent Biology Through Integration and Mining Of Microarray Datasets

Identifying Grade-Specific Genesin Hepatocellular Carcinoma

Page 23: Emergent Biology Through Integration and Mining Of Microarray Datasets

Identifying Grade-Specific Genesin Hepatocellular Carcinoma

Page 24: Emergent Biology Through Integration and Mining Of Microarray Datasets

Breast Cancer Grade-Associated Genes asPredictors of HCC Grade?

HCC

BC

Page 25: Emergent Biology Through Integration and Mining Of Microarray Datasets

ORC6L DNA replicationTROAP M/G1 cell adhesionBUB1 G2/M mitotic spindle checkpoint; oncogenesisCKS2 G2/M cytokinesisMELK G2 tyr/ser/thr kinase activityCDC20 G2/M regulation of cell cycleHN1 G2/M UnknownMCM6 G1/S DNA replication initiationCDC2 G2 mitotic initiationUBE2C G2 cyclin catabolismTOP2A G2 DNA metabolismCDKN3 M/G1 regulation of CDK activityPTTG1 M/G1 mitotic regulation; oncogenesisE2-EPF M/G1 ubiquitin cycleFLJ23462 electron transportGATA3 embryogenesis

Breast Cancer Grade-Associated Genes asPredictors of HCC Grade?

HCC

Page 26: Emergent Biology Through Integration and Mining Of Microarray Datasets

UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 10 (RNA helicase) 2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7

2.5 +UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 3.7 + + ++Hypothetical protein FLJ14299/Similar to nocA zinc-finger protein 2.5 ++Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +

Estrogen Responsive Genes in vitro (Chin-Yo Lin)

UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++

2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7

2.5 +3.7 + + ++2.5 ++

Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +

Page 27: Emergent Biology Through Integration and Mining Of Microarray Datasets

UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +

1 2 3 4 5 6

(p<0.001)

Estrogen-Responsive in vitro and ER Status-Associated in vivo

E2 E2 + ICI E2 + CHX

Page 28: Emergent Biology Through Integration and Mining Of Microarray Datasets

Identifying Cancer-Linked Genesin Epithelial Adenocarcinomas

Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung

Page 29: Emergent Biology Through Integration and Mining Of Microarray Datasets

selection at p<0.001 242 Genes that Distinguish Tumor from Normalat p<0.001 in at least 3 of the 4 Tumor Types

Page 30: Emergent Biology Through Integration and Mining Of Microarray Datasets

database components: internal and external datasets derived from:

- tumor studies (clinical samples) - in vitro, pathway studies (eg, timecourse)- SAGE data- mouse studies (in vitro/in vivo)

An Integrated Database for Pan-CancerMeta-Analysis of Gene Expression Data

Summary

Page 31: Emergent Biology Through Integration and Mining Of Microarray Datasets

Derive expression signatures for all major factors known or suspected to have prognostic value

Determine the reliability of expression signatures in outcome prediction

Expand integrated database for pan- cancer meta-analysis

Integrate expression profiling into clinical decision making

Future Directions

Page 32: Emergent Biology Through Integration and Mining Of Microarray Datasets

Acknowledgements

Catholic University of KoreaSuk-Woo NamJung Yong Lee

GISAdai Ramasamy Liza VergaraPhil LongChin-Yo Lin Benjamin Mow