inferring cancer subnetwork markers

Post on 12-Sep-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markersusing Density-Constrained Biclustering

Phuong Dao∗,1, Recep Colak∗,3

Raheleh Salari1, Flavia Moser4, Elai Davicioni5

Alexander Schönhuth†,2, Martin Ester1,†

1School of Computing Science, Simon Fraser University, Canada

2Centrum Wiskunde & Informatica, Amsterdam, Netherlands

3Department of Computing Science, University of Toronto, Canada4Center for Disease Control, University of British Columbia

5GenomeDX Biosciences Inc.

∗: Joint first authors, †: Joint corresponding, last authors

Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Determination of disease status based on patientgenetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes

Introduction Methods Experimental Results

IntroductionPersonalized Medicine

• Determination of disease status based on patientgenetics/genomics

• Goal: Specific, individual choice of treatment• Necessary: Reliable disease markers

• Monogenic: Each marker is a single gene• Multigenic: Each marker is a set of genes

Introduction Methods Experimental Results

Single Gene Markers

Gene 6

Gene 4

Gene 2Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Gene 2Gene 4

Gene 6Gene 5

Gene 3Gene 1

Ca

se

1

Ca

se

2

Ca

se

3

Co

ntr

ol 1

Co

ntr

ol 2

Co

ntr

ol 3

Differentially Expressed

Non−Differentially Expressed

Gene 5

Gene 3

Caveat: Single gene markers vary significantly across different studies

Introduction Methods Experimental Results

Marker SelectionMultigenic Traits

G2

Gene 4

Gene 2

Gene 1

Case 1

Case 2

Case 3

Contr

ol 1

Contr

ol 2

Contr

ol 3

Gene Expression Profiles Interaction/Association Network

Gene 4

Gene 3

Gene 2

Gene 1

(0.85)

(0.75)

(0.8)(0.9)

(0.95)

G1

G3

G4

Gene 3

Solution: Differentially expressed genes participating in the same pathway[Chuang et al., 2007], [Chowdhury et al. 2010]

Introduction Methods Experimental Results

Our Approach

Each of our subnetwork markers:• is a

densely connected subnetwork+ Disease-related genes have more PPI interactions thanexpected [Goh et al., PNAS (2007)]

• contains genes which are differentially expressedin a subset of samples

+ cancer tumors vary greatly in phenotype, although belongingto the same (sub)type [Hampton et al., GR (2009)]

Introduction Methods Experimental Results

Density-Constrained Biclusters

Definition: G is called α-dense ifP

e∈E we

(|V|2 )≥ α ≥ 0.5.

0.75

0.9

0.85

0.7

0.95

S1

S2

S3

G1

G2

G3

G4

01

1 1

1

1 1 1

0

1

1

1

S1

S2

S3

1

1

1 1

1

1

1 1

0

10

0

G4

G5

G6

G7

G2

G4

G1

G3

0.8

0.75

0.85

0.95

0.9

G4

0.70.9

G6

0.95

0.85

G7

G5

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.5

0.9

0.950.650.35

0.750.8

0.8

0.9

0.8 0.9

0.950.85

0.80.9

Our markers are α-densely connected subnetworks of genes that aredifferentially expressed in a subset of patients of size at least k (here: k = 2).

Introduction Methods Experimental Results

Methods

Introduction Methods Experimental Results

Density Constrained BiclusteringSearch Strategy

Theorem: Every α-densely connected network of size n contains anα-densely connected subnetwork of size n − 1.

maximal wDCB

B

D0.8A

C0.6

B

A0.4 A

D0.9

B

C

D

C

A

B

D 0.40.9

0.8

A

C

D

0.60.9 B

D

C

0.8A

C

B

0.60.4

0.80.9

0.60.4

C

A

D

B

Not Connected

Not Dense

0.80.9

0.60.4

C

A

D

B

= [(0.8 + 0.9 + 0.6 + 0.4) / 6]Density: 0.45

wDCB

Search Strategy: Breadth-first search.

Introduction Methods Experimental Results

Classification

1. Marker computation: Feature space creationmarker = dimension

2. Construct classifier using training data3. Perform classification on test data

Cross-platform study:Marker computation and test data from different platforms

Introduction Methods Experimental Results

Experimental Results

Introduction Methods Experimental Results

Network Data

Confidence-scored PPI network[STRING, von Mering et al., NAR 2009]

• Edges reflect physicalprotein-protein interactions

• Confidence scores reflect theprobability that the interaction isassociated with a cellularphenomenon (and not anexperimental artifact)

• Scoring system based on KEGGpathways

0.95

0.3

0.65

0.75

0.45

0.95

0.55

0.7

0.8

0.45

0.95

0.75

0.6

0.85

0.8

0.25

0.9

0.9

0.950.5

0.9

0.85

0.950.75

0.80.650.35

0.750.8

0.80.9

0.8 0.9

0.9

0.85

0.7

0.9

Introduction Methods Experimental Results

Gene Expression Data

Colon cancer

• GSE8671, 32 patients / tissue pairs

• GSE10950, 24 patients / tissue pairs

• GSE6988, 123 samples across several cancer subtypes

Breast cancer

• GSE3494, 251 patients with different TP53 mutation status (wildtype vs.mutant)

Introduction Methods Experimental Results

Colon CancerPrediction

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20 25 30 35 40 45 50

AU

C

#Subnetworks/Genes

GSE8671 >> GSE6988

SGMGMI

NETCOVERwDCB

Introduction Methods Experimental Results

Colon CancerPrognosis

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50

AU

C

# Subnetworks/Genes

GSE8671 >> GSE6988 prognosis

SGMGMI

NETCOVERwDCB

Introduction Methods Experimental Results

Colon Cancer: PrognosisAccuracy

8671→6988, Prognosis 10950→6988, PrognosisK SGM GMI NC wDCB SGM GMI NC wDCB1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.475 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.6810 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.7420 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.8530 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.8540 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.8950 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91

Top values previous methodsTop value our method

Introduction Methods Experimental Results

Breast CancerTP53 Wildtype vs. Mutant

0.7

0.75

0.8

0.85

0.9

0 5 10 15 20 25

Acc

urac

y

# Subnetworks/Genes

GSE3494 (Miller et al.)

SGM (mappable)GMI (mappable)

wDCB (mappable)SPM (not mappable)

Introduction Methods Experimental Results

Subnetwork Marker Statistics

# Subnetworks Enrichment # Subnetworks EnrichmentGMI 806 0.38 755 0.34NC 923 0.12 N/A N/A

wDCB 282 0.76 216 0.748671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.)NC = NetCover (Chowdhury et al.)

wDCB = weighted Density Constrained Biclustering# Subnetworks = total number of subnetworks computed

Enrichment = enrichment rate of the top-50 markers

Introduction Methods Experimental Results

Top Markers in GSE8671

• Enriched with DNA replicationinitiation (p=6.39e-14), DNAmetabolic process (p=6.15e-12)

• TP53, BRCA1: tumor suppressorgenes

• Minichromosome maintenance(MCM) complex

• MCM2, MCM5: early markers forcolon cancer (Burger et al., 2008)

Introduction Methods Experimental Results

Outlook / Acknowledgments

Outlook:

• Analyze subnetwork signatures

• ncRNA-protein interaction data

Acknowledgments:

• Mehmet Koyutürk

• David DesJardins, Google Inc.

• Lab for Mathematical and Computational Biology, UC Berkeley

Introduction Methods Experimental Results

Thanks for the attention!

Introduction Methods Experimental Results

Densely Connected SubnetworksProperties

Let G = (V , E) be a network with edge weights we, e ∈ E .• The density θ(G) of G is

θ(G) :=

∑e∈E we(|V |

2

) =2 ·

∑e∈E we

|V |(|V | − 1)

where(|V |

2

)is the number of possible edges in G.

• G is called α-dense if

θ(G) ≥ α ≥ 0.5

• An α-dense, connected network G is called α-denselyconnected.

Introduction Methods Experimental Results

Classifier Construction

1. Rank density constrainedbiclusters according to densitysignificance

2. Keep only high-rankedsubnetworks with little overlap

3. Feature space dimension =number of markers

4. SVM classification

Average Gene Expression Profile

1.25

1.5

1.0

1.25

0.5

0.0

0.25

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gene 7

1.25

0.5

Marker 1

Marker 2

0.8

0.950.85

0.75

0.9

G4

G6

0.95

G2

G4

G3

G1

0.70.9

0.85

G5

G7

Average

Gene Expression Profile

Introduction Methods Experimental Results

Colon Cancer: PredictionAccuracy

8671→6988 10950→6988K SGM GMI NC wDCB SGM GMI NC wDCB1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.775 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.8610 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.8820 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.8930 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.8540 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.8950 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89

Top values previous methods, our method

top related