using bayesian networks to analayze expression data

49
Using Bayesian Networks to Analayze Expression Data Shelly Bar- Nahor

Upload: yamal

Post on 13-Jan-2016

34 views

Category:

Documents


2 download

DESCRIPTION

Using Bayesian Networks to Analayze Expression Data. Shelly Bar-Nahor. Today. 1. Introduction to Bayesian Networks. 2. Describe a method for recovering gene interactions from microarray data, using tools for learning Bayesian Networks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Bayesian Networks to Analayze Expression Data

Using Bayesian Networks to Analayze Expression

Data

Shelly Bar-Nahor

Page 2: Using Bayesian Networks to Analayze Expression Data

Today

1. Introduction to Bayesian Networks.

2. Describe a method for recovering gene interactions from microarray data, using tools for learning Bayesian Networks.

3. Apply the method to the S. cerevisiae cell-cycle measurements.

Page 3: Using Bayesian Networks to Analayze Expression Data

Bayesian Networks – Compact representation of probability distributions via conditional independence.

The representation consists of two components:

1. G - a directed acyclic graph (DAG)

Nodes – Random Variables (X1, …,Xn ).

Vertices – direct influence.

2. Θ – a set of conditional probability distributions.

Together, these 2 components specify a unique distribution on (X1, …,Xn ).

A

CB

P(C|A) c0

a0

a1

c1

0.95 0.05

0.1 0.9

Page 4: Using Bayesian Networks to Analayze Expression Data

The graph G encodes the markov assumption:

Each Variable Xi is independent of it’s non-descendants given it’s parents in G.

By applying the chain rull of probabilities -

P(X1 ,…,Xn) = ∏ P(Xi | PaG(Xi))

P(Xi | PaG(Xi)) is the conditional distribution for each variable Xi. We denote the parameters that specify these distributions by Θ.

n

Page 5: Using Bayesian Networks to Analayze Expression Data

The Learning Problem:

Let m be the number of samples and n the number of variables. Given a training set D=(X1, …,Xm), where Xi = (Xi1, …,Xin) , find a network B=<G,Θ> that best matches D.

Let’s assume we have the graph’s structure. We now want to estimate the parameter Θ given the data D.

To do so, we will use the ‘Likelihood Function’:

L(Θ:D) = P(D| Θ) = P(X1=x[1], … ,Xm=x[m]| Θ) = P(x[i] | Θ)

∏m

Page 6: Using Bayesian Networks to Analayze Expression Data

Learning Parameters - Likelihood Function

BE

A

C

E[1] B[1] A[1] C[1]

: : : :

E[m] B[m] A[m] C[m]

D =

Assume i.i.d samples,

the Likelihood functions is:

L(Θ:D) = P(E[m],B[m],A[m],C[m] | Θ)

∏m

Page 7: Using Bayesian Networks to Analayze Expression Data

Learning Parameters - Likelihood Function

BE

A

C

L(Θ:D) = P(E[m],B[m],A[m],C[m] : Θ) =

∏m

∏m

P(E[m] : Θ) .

P(B[m] : Θ) .

P(A[m] | B[m],E[m] : Θ) .

P(C[m] | A[m] : Θ)

P(E[m]:Θ) P(B[m]:Θ) P(A[m]|B[m],E[m]:Θ) .

P(C[m]|A[m]:Θ) ∏m

∏m

∏m

∏m

=

Page 8: Using Bayesian Networks to Analayze Expression Data

Learning Parameters - Likelihood Function

General Bayesian Networks

L(Θ:D) = P(X1=x[1], … ,Xm=x[m]| Θ) = P(Xi[m]|Pai[m]: Θi) = Li(Θi:D)

Decomposition – Independent estimation problems.

∏m

∏i ∏

i

∏m

Page 9: Using Bayesian Networks to Analayze Expression Data

Learning Parameters

MLE - maximum likelihood estimator Bayesian Inference – Learning using bayes rule

Represent uncertainty about the sampling using a bayesian network: Θ

X[1] X[2] X[m]. . . . . . .

The values of X are independent given Θ

P(X[m] | Θ ) = Θ

Bayesian prediction is inference in this network

Page 10: Using Bayesian Networks to Analayze Expression Data

P(x[m+1]| x[1],…,x[m]) =

∫ P(x[m+1]| Θ,x[1],…,x[m])P(Θ|x[1],…x[m])d Θ =

∫ P(x[m+1]| Θ)P(Θ|x[1],…,x[m])d Θ

Θ

X[1] X[2] X[m]. . . . . . . X[m+1]

Observed Data query

Bayesian prediction is inference

in this network

P(Θ|x[1],…,x[m]) =

P(x[1],…,x[m]| Θ) P(Θ)

P(x[1],…,x[m])

Likelihood prior

Probability of data

Bayes rule

0

10

1

Page 11: Using Bayesian Networks to Analayze Expression Data

Equivalence Classes of Bayesian Netorks

Problem: The joint probability represented by a graph can equaly be represented by another one.

Ind(G) – set of independence statements that hold in all distributions the markov assumption.

G and G’ are aquivalent if Ind(G) = Ind(G’)

A C B

A C B

A C B

P(x) = P(A)P(C|A)P(B|C)

P(x) = P(C)P(A|C)P(B|C)

P(C|A)P(A)=

P(A|C)P(C)

In the same way

Page 12: Using Bayesian Networks to Analayze Expression Data

Equivalence Classes of Bayesian NetworksTwo DAGs are equivalent if and only if they have the same underlying undirected graph and the same v-structure.

We will represent an equivealence class of netwrok structures by a partially DAG (PDAG), where a directed edge denotes all members of the class contain the same directed arc, and an undirected edge otherwise.

Page 13: Using Bayesian Networks to Analayze Expression Data

Learning Causal Patterns

•We want to model the mechanism that generate the adependencies (e.g. gene transcriptions).

• A Causal Networks is a model of such causal processes.

• representation: a DAG where each node represents a random variable with a local probabilty model. The parents of a variable are its immediate cause.

Page 14: Using Bayesian Networks to Analayze Expression Data

Learning Causal Patterns

A causal Network models not only the distribution of the observation, but also the effects of interventions.

• Observations – a passive mesurement of out domain (I.e. a sample from X)

• Intervention – setting the values of some variable using forces outside the causal model .

Page 15: Using Bayesian Networks to Analayze Expression Data

Learning Causal Patterns

•If X causes Y, then manipulating the value of X affect the value of Y.

•If Y couses X, then manipulating the value of X will not affect Y.

•X Y and Y X are equivalent Bayseian networks.

Causal Markov Assumption: given the values of a variable’s immdeiate causes, it is independent of it’s earlier causes.

Page 16: Using Bayesian Networks to Analayze Expression Data

Learning Causal Patterns

• When making the causal Markov assumption a causal network can be interpreted as a Bayesian Network.

•From obeservations alone we cannot distinguish between causal networks that belong to the same equivalence class.

•From a directed edge in the PDAG we can infer a causal direction.

Page 17: Using Bayesian Networks to Analayze Expression Data

So Far…

• The likelihood Function

• Parameter estimation and the decomposition principle

• Equivalence Classes of Bayesian Netorks.

• Learning Causal Patterns

Page 18: Using Bayesian Networks to Analayze Expression Data

Analyzing Expression Data

1. Present modeling assumptions

2. Find high scoring networks.

3. Characterize features.

4. Estimate Statistical Confidence in features.

5. Present local probability models.

Page 19: Using Bayesian Networks to Analayze Expression Data

We consider probability distributions over all possible states of the system.

A state is discribed using random variables.

Random Variables –

• expression level of individual gene

• expreimental conditions

• temporal indicators (time/stage the sample was taken).

• background variables (which clinical procedure was used to take the sample)

Modeling assumptions

Page 20: Using Bayesian Networks to Analayze Expression Data

Analyzing Expression Data

1. Present modeling assumptions

2. Find high scoring networks.

3. Characterize features.

4. Estimate Statistical Confidence in features.

5. Present local probability models.

Page 21: Using Bayesian Networks to Analayze Expression Data

The Lerning Problem:

Given a training set D=(X1, …,XN) of independent instances X, find an equivalence class of networks B=<G, Θ> that best matches D. We will use a scoring system:

S(G:D) = logP(G|D) = … = logP(D|G) + logP(G) + C

P(D|G) = ∫P(D|G, Θ)P(Θ|G)d Θ

Likelihood Priors over parameters

Marginal Likelihood

prior

Page 22: Using Bayesian Networks to Analayze Expression Data

The learning problem - Scoring, cont.

Properties of the selected priors:

• structure equivalent

• decomposable

S(G:D) = ∑ ScoreContribution(Xi,Pa(Xi)) : D)Now, learning amounts to finding structure G that maximize the score.

This problem is NP-hard

We resort to huristic search

Page 23: Using Bayesian Networks to Analayze Expression Data

Local Search Strategy

Using decomposition, we can change one arc and evaluate the gains made by this change

B

C

A B

C

A B

C

A

If and arc to Xi is added or deleted, only score(Xi, Pa(Xi)) needs to be evaluated. If an arc is reversed only score(Xi, Pa(Xi)) and score(Xj, Pa(Xj)) need to be evaluated.

Initial structure G Neighboring structures G’

Page 24: Using Bayesian Networks to Analayze Expression Data

Find High-Scoring Networks

Problem:

Small data sets are not sufficiently informative to determine that a single model is the “right” model (or equivalence class of models).

Solution:

analyze a set of high-scoring networks. Attempt to characterize features that are common to most of these networks and focus on learning them.

Page 25: Using Bayesian Networks to Analayze Expression Data

Analyzing Expression Data

1. Present modeling assumptions

2. Find high scoring networks.

3. Characterize features.

4. Estimate Statistical Confidence in features.

5. Present local probability models.

Page 26: Using Bayesian Networks to Analayze Expression Data

Features

We will use two classes of features involving pairs of variables.1. Markov Relations – Is Y in the Markov

Blanket of X?

Y is in X’s Markov Blanket if and only if there is either an edge between them, or both are parents of another variable.

A Markov Realation indicates that the two genes are related in some joint biological interaction or process.

Page 27: Using Bayesian Networks to Analayze Expression Data

Features

2. Order Realtions – Is X an ancestor of Y in all the networks of a given equivalence class?

Does the PDAG contain a path from X to Y in which all the edges are directed?

This is an indication that X might be a couse of Y.

Page 28: Using Bayesian Networks to Analayze Expression Data

Analyzing Expression Data

1. Present modeling assumptions

2. Find high scoring networks.

3. Characterize features.

4. Estimate Statistical Confidence in features.

5. Present local probability models.

Page 29: Using Bayesian Networks to Analayze Expression Data

Estimate Statistical Confidence in Features

• We want to estimate to what extent the data support a given feature.

• We use the Bootstrap Method:

We generate “purturbed” versions of the original data, and learn from them.

• We should be more confident on features that would still be induced from the “purturbed” data.

Page 30: Using Bayesian Networks to Analayze Expression Data

We use the Bootstrap as foolows:

For i=1 … m (in our experiments m=200)

o Re-sample with replacement N instance of D. denote Di

the resulting dataset.

o Apply the learning procedure on Di induce a

network structure Gi.

For each feature f of interest calculate:

conf(f) = 1/m ∑f(Gi)

f(G) is 1 if f is a feature in G, and 0 otherwise.

Estimate Statistical Confidence in Features

Page 31: Using Bayesian Networks to Analayze Expression Data

• Features induced with high confidence are rarely false positives.

• The bootstrap procedure is especially robust fot the Markov and order features.

• The conclusions that can be established on high confidence features are reliable even in cases where the data sets are small for the model being induced.

Estimate Statistical Confidence in Features

Page 32: Using Bayesian Networks to Analayze Expression Data

Analyzing Expression Data

1. Present modeling assumptions

2. Find high scoring networks.

3. Characterize features.

4. Estimate Statistical Confidence in features.

5. Present local probability models.

Page 33: Using Bayesian Networks to Analayze Expression Data

Local Probability Models

• In order to specify a Bayesian network model we need to choose the type of local probability model we use.

• The choice of representation depends on the type of variables we use:

Discrete Variables – can be represented with a table.

Continuous variables – no representation for all possible densities.

Page 34: Using Bayesian Networks to Analayze Expression Data

Local Probability Models

We consider two approaches:

1. Multinomial Model – treat each variable as discrete and learn a multinomial distribution that describes the probability of each possible state of a child variable given the state of it’s parent.

We descretize by setting a threshold to the ratio between measured expression and control: values lower than 2-0.5 are under_expressed(-1), and higher then 20.5 are over_expressed(1).

Page 35: Using Bayesian Networks to Analayze Expression Data

Local Probability Models

2. Linear Gaussian model – Learn a linear regression model for child variable give it’s parents.

If U1,…,Uk are parent of variable X, then

P(X|u1, …,uk) ~ N(a0 + ∑ai·ui, σ2).

That is, X is normally distributed aroud a mean that depends linearly on the values of its parents.

Page 36: Using Bayesian Networks to Analayze Expression Data

Local Probability Models

• In the multinomial model, By Discretizing the mesaured expression levels we loose information.

• The linear-Gaussian model can only detect dependencies that are close to linear. In particular it is not likely to discover combinatorial effects (e.g. a gene is over expressed if and only if certain several genes are jointly over expressed).

Page 37: Using Bayesian Networks to Analayze Expression Data

Application to Cell Cycle Expression Patterns• Data from Spellman et al. ‘Comprehensive identification of cell cycle-regulates genes of the yeast sacccharomyces cerevisia by microarray hybridization’, Mullecular Biology of the Cell.

• Contains 76 gene expression measurements of the mRNA levels of yeast.

• Spellman et al identified 800 genes whose expression varied over the different cell-cycle stages.

Page 38: Using Bayesian Networks to Analayze Expression Data

Application to Cell Cycle Expression Patterns• Treat each mesaurement as an independent sample, and do not take into account the temporal aspect of the mesaurement.

• Compensate by adding to the root of all learned networks, a variable denoting the cell cycle phase.

• Performed 2 experiments: one with the multinomial distribution, and the other with linear Gausian distribution. http://www.cs.huji.ac.il/~nirf/GeneExpression/top800/

Page 39: Using Bayesian Networks to Analayze Expression Data

Confidence

threshold

Number of features with confidence equal or higer then the x-value

Linear Gaussian - Order

Robustness Analysis – Credibility of confidence assessment

Page 40: Using Bayesian Networks to Analayze Expression Data

Robustness Analysis – Credibility of confidence assessment

Confidence threshold

Number of features with confidence equal or higer then the x-value

Multinomial - Order

Page 41: Using Bayesian Networks to Analayze Expression Data

Robustness Analysis – Adding more genes

Multinomial model

x: confidence with 250 genes, y: with 800

Page 42: Using Bayesian Networks to Analayze Expression Data

Robustness Analysis – discretization

• The descritization methos penalizes genes whose natural range of variation is small since a fixed threshold is used.

• Avoid the problem by normalizing the expression of genes in the data.

• The top 20 Markov relations highlighted by this method were a bit different, and the order relation were more robust. Possibly because order relations depend on the network structure and not local.

Page 43: Using Bayesian Networks to Analayze Expression Data

Robustness Analysis – compare between the linear-Gausian and multinomial experiments.

X-axis: confidence in the multinomial experiment Y-axis: confidence in the Linear Gaussian experiment

Page 44: Using Bayesian Networks to Analayze Expression Data

Biological Analysis – Order Relations

• Found existence of dominant genes. Out of all 800 genes only few seem to dominant the order.

• Among them are genes that are:

directly involove in initiation of the cell-cycle and its control.

components of pre-replication complexes.

involved in DNA repair ,which are associated with transcription initiation.

Page 45: Using Bayesian Networks to Analayze Expression Data

Biological Analysis - Markov Relations

• Among top scoring relations, all involving two known genes make sense biology.

• Several of the unknown pairs are phsically adjacent on the chromozome, and persumably regulatedby the same mechanism

• Some relations are beyond limitations of clustering.

Page 46: Using Bayesian Networks to Analayze Expression Data

Example: CLN2,RNR3,SVS1,SRO4,RAD51 all appear in the same cluster by spellman et al. In our network CLN2 is a parent of the other 4, while no links were found between them. This suit biological knowledege: CLN2 is a central and earlycell-cycle control, while there is no clear biological relationship between the others.

Page 47: Using Bayesian Networks to Analayze Expression Data

discussion

The approach

is capable of handling noise and estimating the confidence in the different features in the network.

managed to extract many biologically plausible conclusions.

Is capable of learning rich structures from the data, such as discovering causal relationships, and interaction between genes other then positive correlation.

Page 48: Using Bayesian Networks to Analayze Expression Data

discussion

Ides:

learn models over “clustered” genes.

recover all relationships in one analysis.

Improving the confidence estimation.

Incorporating biological knowledge as prior knowledge to the analysis.

Learn causal patterns, while adding intervantional data.

Page 49: Using Bayesian Networks to Analayze Expression Data

The End