identifying causal genes and dysregulated pathways in complex diseases

Post on 22-Feb-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Yoo-Ah Kim NIH / NLM / NCBI. Identifying Causal Genes and Dysregulated Pathways in Complex Diseases. Nov. 6 th , 2010. Complex Diseases. Associated with the effects of multiple genes As opposed to single gene diseases - PowerPoint PPT Presentation

TRANSCRIPT

IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES

Nov. 6th, 2010

YOO-AH KIMNIH / NLM / NCBI

Complex Diseases

Associated with the effects of multiple genesAs opposed to single gene diseases

The combination of genomic alteration may vary strongly among different patients

Dysregulating the same components, thus often leading to the same disease phenotype

Difficult to study and TreatCancer, Heart diseases, Diabetes, etc.

Copy Number Variations

Two copies of each gene are generally assumed to be present in a genome

Genomic regions may be deleted or duplicated causing CNV

Some CNVs are associated with susceptibility or resistance to diseases such as cancer

Copy Number Variations in 158 Glioblastoma patients

Identifying Genomic Causes in Complex Diseases

Identify genotypic causes in individual patients as well as dysregulated pathways

Systems biology approachGenome-wide searchGraph theoretic algorithms

Circuit flowSet cover

158 Glioblastoma multiforme patients

Glioblastoma multiforme (GBM)

the most common and most aggressive type of primary brain tumor in humans

Expression as Quantitative Trait

Genotype:Copy number variations

Phenotype:Gene expression

eQTL (expression Quantitative Trait Loci) Analysis

While we assume that the genetic variation is the cause and expression change is the effect, we don’t know molecular pathways behind the relation

Putative target gene Putative causal gene/loci

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

Target Gene Selection

Select a representative set of disease genes Filter differentially expressed genes

for each case Multi-set cover

Gene 1 Gene 2 Gene 3

.

.

.

.

.

Controls Disease Cases

Gene Expression

Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and

target genes

eQTL

casestarget genes

tag Loci

cases

Finding Candidate Causal Genes

Genotypic Variations Target Genes

Finding Candidate Causal Genes

?

Genotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

Finding Candidate Causal Genes

Genotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

D

Interaction Network

protein-protein interactions phosphorylation eventstranscription factor interactions.

Finding Candidate Causal GenesGenotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

u

v

D

Current flow

+-

Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2

Interaction Network

Finding Candidate Causal GenesGenotypic Variations Target Genes

C1

C2

C3

C4

C5

Candidate Genes

D

Current flow

+-

Compute the amount of current entering each causal gene by solving a system of linear equations

Interaction Network

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Final Causal Gene Selection

cases

causal genesA putative causal gene explains a disease case if • its corresponding tag locus has a copy

number alteration• its affected target genes (i.e., genes

sending a significant amount of current to the causal gene) are differentially expressed in the disease case

WEIGHT

Final Causal Gene Selection

Find a smallest set of genes covering (almost) all cases at least k’ times minimum weighted multi-set cover

Dysregulated Pathways

Causal paths between a target and a causal gene a maximum current path

C1

C2C3

C4

C5

D

Selected Causal Genes

Number of Genes Overlap with GBM genes

Step B: eQTL 16056 0.56 (75)

Step C: Circuit flow 701 0.045 (10)

Step D: Set cover 128 4.7 10-4 (6)

Results

128 causal genes from set cover (STEP D)

701 candidate causal gene from circuit flow algorithm (STEP C)

Causal Genes

BSOSC Review, November 2008

P-value GenesGlioma 0.008 PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN

Cell cycle 0.028 MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1p53 signaling pathway 0.030 CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN

Proteasome 0.026 PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4

Functional analysis using DAVID

The selected causal gene set includes many known cancer implicated genes

PTEN as causal gene

fold change- 0 +

TF-DNAprotein-protein

kinase

TF

causalgenes

EGFR as causal and target gene

fold change- 0 +

kinase

TF

causalgenes

TF-DNAprotein-protein

phosphorylation

Causal EGFR

Target EGFR

Conclusion

A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover

Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate

nodes on molecular pathways Our method can be applied to any disease system where

genetic variations play a fundamental causal role

Acknowledgements

Teresa M. Przytycka Stefan Wuchty

Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng

Method Outline

A. Target gene selection Gene expression

B. eQTL Find association between

expression and copy number

C. Circuit flow algorithm Molecular interactions Candidate causal genes

D. Causal gene selection Weighted multiset cover

cases

target genes gm

g3

g2

g1

tag loci

sn

s3

s2

s1

s4

cases

causalgenes

cases

targetGene gm

tagSNP sn

causalgenes

+ -

A

CTF-DNA

phosphoryl.event

protein-protein

D

B

EGFR as causal and target geneCAU

SAL PATHS

fold change- 0 +

kinase

TF

causalgenes

TF-DNAprotein-protein

phosphorylation

causal EGFR

target EGFR

PTEN as causal geneCAU

SAL PATHS

fold change- 0 +

TF-DNAprotein-protein

kinase

TF

causalgenes

Our Method

Integrate several types of data Gene expression Copy number variations Molecular interactions

Methods and Results

Method model the expression change of disease

genes as a function of genomic alterations translated the propagation of information

from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions.

multi-set cover: select most prominent genes

Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes

diseasegene gm

tagSNP

sn

causalgenes

+ -

top related