gene interaction networks for functional analysis and prognostication

31
Andrey Alexeyenko Gene interaction networks for functional analysis and prognostication

Upload: lacota-preston

Post on 04-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

Gene interaction networks for functional analysis and prognostication. Andrey Alexeyenko. “Data is not information, information is not knowledge , knowledge is not wisdom, wisdom is not truth,” — Robert Royar (1994 ). Biological data production. http://www.hdpaperz.com. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gene  interaction networks for functional analysis and  prognostication

Andrey Alexeyenko

Gene interaction networks for functional analysis and

prognostication

Page 2: Gene  interaction networks for functional analysis and  prognostication

“Data is not information, information is not knowledge,knowledge is not wisdom, wisdom is not truth,”

—Robert Royar (1994)

Page 3: Gene  interaction networks for functional analysis and  prognostication

http://www.hdpaperz.com

Biological data production

…and analysis

Page 4: Gene  interaction networks for functional analysis and  prognostication

Primary molecular processes

Response, recorded with high-throughput

paltforms

Known biological units: processes, pathways

Protein abundance1. 2. 3. 4. …N.

Phosphorylation1. 2. 3. 4. …N.

Methylation1. 2. 3. 4. …N.

mRNA expression1. 2. 3. 4. …N.

ChipSeq1. 2. 3. 4. …N.

Mutation profile1. 2. 3. 4. …N.Metabolites

1. 2. 3. …N.

Pathway analysis: why needed, what it is?

Page 5: Gene  interaction networks for functional analysis and  prognostication

Cancer heterogeneity

Cancers of a primary site may represent a heterogeneous group of diseases of diverse molecular origin which vary with regard to

– oncogenic mutations that have initiated them, – protein landscape (abundance, functionality) that

maintains the disease progression, and thereby:– their responsiveness to specific drugs.

Page 6: Gene  interaction networks for functional analysis and  prognostication

FunCoup is a data integration framework to discover

functional coupling in eukaryotic proteomes with

data from model organisms

Amouse

Bmouse

?F

ind

ort

hol

og

s*

Human

Fly

Rat

Yeast

Hig

h-th

roug

hput

ev

iden

ce

Andrey Alexeyenko and Erik L.L. Sonnhammer (2009) Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research.

Page 7: Gene  interaction networks for functional analysis and  prognostication

FunCoup: on-line interactome resource

Andrey Alexeyenko and Erik L.L. Sonnhammer (2009) Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research.

http:

//Fu

nCou

p.sb

c.su

.se

Page 8: Gene  interaction networks for functional analysis and  prognostication

Do gene networks tell

any story?

• Yellow diamonds: somatic mutations in prostate cancer

• Pink crosses: also mutated in glioblastome (TCGA)

State-of-the-art method to beat:Frequency analysis of somatic mutations

Page 9: Gene  interaction networks for functional analysis and  prognostication

Network enrichment analysis:

Altered genes (green) from one individual lung cancer are enriched in network connections to members of ErbB (HER2) pathway (yellow) and GO term “apoptosis” (blue).

Question: How to find something in common between many altered genes?

Page 10: Gene  interaction networks for functional analysis and  prognostication

Apophenia: the human propensity to see meaningful patterns in random data

(Brugger 2001; Fyfe et al. 2008)

Page 11: Gene  interaction networks for functional analysis and  prognostication

Network enrichment analysis: compared to a reference and quantified

N links_real = 12

N links_expected = 4.65

Standard deviation = 1.84

Z = (N links_observed – N links_expected) / SD = 3.98

P-value = 0.0000344

FDR < 0.1

Actual network: observed pattern A random pattern

Question:Is ANXA2 related to TGFbeta signaling?

Page 12: Gene  interaction networks for functional analysis and  prognostication

Functional characterization of novel gene sets

Functional set?

Our alternative:Network enrichment analysis

Altered genes

0 50 100 150 200 250 300 350 400 450

No. of positives, random groups

0

200

400

600

800

1000

1200

1400

No.

of p

ositi

ves,

rea

l gro

ups

GEA_SIGN GEA_REST NEA_SIGN NEA_REST

State-of-the-art method to beat:Gene set enrichment analysis

Alexeyenko A, Lee W, … Pawitan Y. Network enrichment analysis: extension of gene-set enrichment analysis to gene networks. BMC Bioinformatics, 2012

Page 13: Gene  interaction networks for functional analysis and  prognostication

Towards even better network predictionpartial correlations:

a way to get rid of spurious links

0.7

0.6

0.4

Page 14: Gene  interaction networks for functional analysis and  prognostication

Cancer-specific networks:links inferred from

expression, methylation, mutations

Functional couplingtranscription transcription transcription methylation methylation methylation mutation methylation mutation transcriptionmutation mutation

+ mutated gene

State-of-the-art method to beat:

Reverse engeneering froma single

source (usually transcriptome)

Page 15: Gene  interaction networks for functional analysis and  prognostication

How informative is a global gene network?

Page 16: Gene  interaction networks for functional analysis and  prognostication
Page 17: Gene  interaction networks for functional analysis and  prognostication

Now: answering biological questions

Page 18: Gene  interaction networks for functional analysis and  prognostication

Molecular phenotypes in network space(lung cancer, data from ChemoRes consortium)

P53 signaling

Cell cycle

Apoptosis0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Years since surgery

Re

lap

se-f

ree

su

rviv

al

Predictors:

GO:0001666 response to hypoxiaGO:0005154 EGFR bindingGO:0005164 TNF bindingGO:0070848 response to growth factor stimulus

Question: How to distinguish between different molecular subtypes of cancer?

Page 19: Gene  interaction networks for functional analysis and  prognostication

Science 29 March 2013: Cancer Genome LandscapesBert Vogelstein, Nickolas Papadopoulos, Victor E. Velculescu, Shibin Zhou, Luis A. Diaz Jr., Kenneth W. Kinzler*

"Though all 20,000 protein-coding genes have been evaluated in the genome-wide sequencing studies of 3284 tumors, with a total of 294,881 mutations reported, only 125 Mut-driver genes, as defined by the 20/20 rule, have been discovered to date (table S2A). Of these, 71 are tumor suppressor genes and 54 are oncogenes. An important but relatively small fraction (29%) of these genes was discovered to be mutated through unbiased genome-wide sequencing; most of these genes had already been identified by previous, more directed investigations. “

“At best, methods based on mutation frequency can only prioritize genes for further analysis but cannot unambiguously identify driver genes that are mutated at relatively low frequencies”

Page 20: Gene  interaction networks for functional analysis and  prognostication

Is NTRK1 a driver in the GBM tumor TCGA-02-0014?

Page 21: Gene  interaction networks for functional analysis and  prognostication

Somatic mutations: drivers vs. passengersdata from The Cancer Genome Atlas

Page 22: Gene  interaction networks for functional analysis and  prognostication

Gains and losses on chromosome 7, in 142 glioblastoma multiforme genomes, TCGA

Driver copy number changesCopy number of MAP3K11

Altered Normal

Point mutation inPTEN

Present 6 (~2) 29 (~33)Absent 2 (~6) 105 (~102)

Cop

y nu

mbe

rC

onfid

ence

Page 23: Gene  interaction networks for functional analysis and  prognostication

Genetic association of sequence variants near AGER/NOTCH4 and dementia.Bennet AM, Reynolds CA, Eriksson UK, Hong MG, Blennow K, Gatz M, Alexeyenko A, Pedersen NL, Prince JA.J Alzheimers Dis. 2011;24(3):475-84.

Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease.Hong MG, Alexeyenko A, Lambert JC, Amouyel P, Prince JA.J Hum Genet. 2010 Oct;55(10):707-9. Epub 2010 Jul 29.

Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk.Reynolds CA, Hong MG, Eriksson UK, Blennow K, Wiklund F, Johansson B, Malmberg B, Berg S, Alexeyenko A, Grönberg H, Gatz M, Pedersen NL, Prince JA.Hum Mol Genet. 2010 May 15;19(10):2068-78. Epub 2010 Feb 18.

Validation of candidate disease genes(work with Jonathan Prince, MEB, KI)

Question: Is there extra evidence for GWAS-candidates to be involved?Answer: Yes, for some…

Page 24: Gene  interaction networks for functional analysis and  prognostication

Resistance to vinorelbine in lung cancer: pathways

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

GO:0001666 response to hypoxia

Years since surgery

Rela

pse-f

ree s

urv

ival

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

GO:0005154 EGFR binding

Years since surgery

Rela

pse-f

ree s

urv

ival

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

GO:0005164 TNF binding

Years since surgery

Rela

pse-f

ree s

urv

ival

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

GO:0070848 response to growth factor stimulus

Years since surgery

Rela

pse-f

ree s

urv

ival

Page 25: Gene  interaction networks for functional analysis and  prognostication

Overlap between gene signatures of relapse

From Roepman et al., 2007

Usually the overlap between signatures is negligible. Because e.g.:• Different sub-types of patient population, • Different microarray platforms!

However the main reason is:

Dimensionality curse!

Page 26: Gene  interaction networks for functional analysis and  prognostication

Consistency of molecular landscape:raw genes or network-based pathway scores?

Gene expression

Gene expression => pathway scores

Page 27: Gene  interaction networks for functional analysis and  prognostication

Relapse

+ -

Treatment

+ A B

- C D

Resistance to vinorelbine in lung cancerT-R- T-R+ T+R- T+R+

-4-3

-2-1

01

23

MED

17

T-R- T-R+ T+R- T+R+

-6-5

-4-3

-2-1

0

AC

005921.3

T-R- T-R+ T+R- T+R+

-4-2

02

4

CD

C2L6

T-R- T-R+ T+R- T+R+

-3-2

-10

12

3

FN

BP4

T-R- T-R+ T+R- T+R+

-5-4

-3-2

-10

12

MED

4

T-R- T-R+ T+R- T+R+

-3-2

-10

12

TB

L1Y

T-R- T-R+ T+R- T+R+

-6-4

-20

2

CD

K8

T-R- T-R+ T+R- T+R+

-1.5

-1.0

-0.5

0.0

0.5

1.0

CLPTM

1L

Question: What predicts tumor resistance to chemotherapy?Answer: Depletion of differential transcriptome towards few specific genes

Page 28: Gene  interaction networks for functional analysis and  prognostication

Resistance to vinorelbine in lung cancer

Functionally coherent genes associated with vinorelbine resistance. A. Network representation of the group. Magenta: genes associated with resistance in NEA and likely producing a protein complex (ranked 1, 3, 5, 6, 11, and 18) plus one more gene CLPTM1L (beyond the ranking but also significantly associated, was previously reported as related to cisplatin resistance); yellow: non-commonly expressed genes linked with the NEA genes and less represented in the susceptible tumors, hence most contributing to the association (see Results for more explanation).B. Box-plots of NEA scores for the genes colored magenta in A.

Page 29: Gene  interaction networks for functional analysis and  prognostication
Page 30: Gene  interaction networks for functional analysis and  prognostication

Network analysis:how to succeed?

• Analyze prioritized candidates (from genotyping, DE, GWAS…) rather than any genes.

• Do not lean on single “interesting” network links. Employ statistics!i.e.

“concrete questions” => “testable hypotheses” => “concrete answers”

The amount of information in known gene networks is enormous.

Let’s just use it!

Page 31: Gene  interaction networks for functional analysis and  prognostication

Summary• Raw data production• Data accumulation globally• Ambitions• Need in data interpretation• Functional interpretation• Networks: in reality and as logical models• Significance: as important in bioinformatics as in “core” biology

– Risk of false discovery– Adjustment for multiple testing

• Enrichment analysis: pure gene sets and in the network• Biological questions to answer:

– Driver mutations– Other “driving” events (changed copy number, methylation, expression)– Association between a clinical feature and multiple genes at once