transcriptional diagnosis by bayesian network

Post on 11-Jan-2016

34 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Transcriptional Diagnosis by Bayesian Network. Hsun-Hsien Chang and Marco F. Ramoni. Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009. Background. - PowerPoint PPT Presentation

TRANSCRIPT

1

Harvard Medical School

Transcriptional Diagnosis by Bayesian Network

Hsun-Hsien Chang and Marco F. Ramoni

Children’s Hospital Informatics Program

Harvard-MIT Division of Health Sciences and Technology

Harvard Medical School

March 17, 2009

2

Harvard Medical School

Background

• Microarray technology enables profiling expression of thousands of genes in parallel on a single chip.

• Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis.

• Challenge: – Number of variables (i.e., genes) is much greater than the

number observations (i.e., biological samples), inducing the problem of overfitting.

• Existing methods:– Gene selection: compute statistics (eg., t-statistics, SNR,

PCA) of individual genes and select high rank genes.– Classification model: create a classification function of

selected genes.

3

Harvard Medical School

Proposed Approach

• Issues:– Assumption on gene independencies is inadequate. – Other genes may be collinearly expressed with the signature.– Selection and classification are two non-integrated steps.

Need a cut-off threshold to select high rank genes.

• Proposed strategies:– Adopt system biology approach to infer the functional

dependence among genes.– Use the dependence network for tissue discrimination. – Integrate gene selection and classification model in Bayesian

network framework.

4

Harvard Medical School

Data Representation by Bayesian Network

Gene 1

Gene 2

Gene N

Cas

e 1

.

.

.

.

.

.

Cas

e 2

. . . .

Tissue state 1

Cas

e M

Tissue state 2

G1

Pheno

G2

GN

.

.

.

.

.

.

• Bayesian networks are directed acyclic graphs where:– Node corresponds to random variables.– Directed arcs encode conditional probabilities of the target

nodes on the source nodes.

5

Harvard Medical School

Gene Selection by Bayes Factor

Pheno

G1

G2

GN

Gp

Gq

G1

Pheno

G2

GN

.

.

.

.

.

.

gene selection by Bayes factor

6

Harvard Medical School

Collinearity Elimination via Network Learning

Pheno

G1

G2

GN

Gp

Gq

Pheno

G2

GN

Gp

Gq

G1

Gp

GN

collinearity elimination

7

Harvard Medical School

Sample Classification

• The phenotype variable is independent of the blue genes, given the green genes.

• Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination.

• Tissue classification:

GN

Pheno

G2

Gp

Gq

G1

8

Harvard Medical School

Algorithm Summary

Gene Selection by Bayes Factor

Collinearity Elimination

Sample Classification

Optimize Performance

......

...

...

Optimize Hyperparameters

(sensitivity analysis)

...

9

Harvard Medical School

• Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer:– AC and SCC are distinct in survival, chances of metastasis,

and responses to chemotherapy and targeted therapy.

– Physicians lack confidence in correct recognition when there are multiple primary carcinomas.

• Training: – 58 ACs and 53 SCCs.– 77 genes selected in the network.– 25 signature genes.

Discriminate Lung Carcinoma Subtypes

10

Harvard Medical School

Bayesian Network for Lung Carcinoma

11

Harvard Medical School

Large-Scale Testing on Independent Samples

• 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese).

• Accuracy = 95.2% AUROC.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curves

1-specificity

sen

sitiv

ity

Proposed Bayes Net (95.2%)

12

Harvard Medical School

Comparisons with Other Popular Methods

• Higher classification accuracy.• Small-sized signature to avoid overfitting.

Testing AUROC

p-value# signature

genes

Bayesian Network 95.2% --- 25

PCA/LDA 91.2% 0.0047 13PAM

(Tibshirani et al., PNAS 2002)91.0% 0.0014 77

Weighted Voting(Golub et al., Science 1999)

93.4% 0.6240 800

13

Harvard Medical School

KRT6 Family Characterizes the Lung Carcinoma Discrimination

14

Harvard Medical School

KRT6 Family Characterizes the Lung Carcinoma Discrimination

• Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes.

– Accounting for 95% of the accuracy of the whole 25-gene signature.

– Located on chromosome 12q12-q13.

– A nonlinear, concave discriminative surface.

15

Harvard Medical School

Verification by Chr12q12-q13 Aberrations• Investigate DNA copy number changes in comparative

genomic hybridization (CGH) array.– 12 ACs and 13 SCCs from

Vrije University Medical Center, Netherland.

– A dumbbell discriminative surface achieves 80% classification accuracy.

– Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.

16

Harvard Medical School

Conclusion

• Reverse engineer regulatory network information for tissue classification.

• Adopt the system biology approach to infer gene dependencies network.– Select genes by Bayes factor.– Eliminate collinearity via network learning.– Integrate gene selection and classification model

in a single Bayesian network framework.• Demonstrate the promising translational

value of the system biology approach in clinical study.

top related