setting up a replica exchange approach to motif discovery in dna jeffrey goett advisor: professor...

25
Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Post on 21-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Setting Up a Replica Exchange Approach to Motif

Discovery in DNAJeffrey Goett

Advisor:

Professor Sengupta

Page 2: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Protein Synthesis from DNA

Translation to

Proteins

TranscriptionRegulation

RNA polymerase

Binding

Proteins

geneBinding

sites

Page 3: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Binding Sites

Sequence A:

code for protein

Binding protein “A”Binding Site

A - A - C - G - A - C -

T - T - G - C - T - G -

T - T - C - A - A - C - C - A -

A - A - G - T - T - G - G - T -

Sequence B:

code for protein

A - A - G - G - A - C -

T - T - C - C - T - G -

C - G - T - T - G - C - T - C -

G - C - A - A - C - G - A - G -

Binding protein “A”

Page 4: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Discovering New Binding Motifs

…ATCG GCTCAG CTAG……CACT GATCAG AGTA……TTCC GCTCTG TAAC……GCTA GCTCAA ATCG…

A 0 .25 0 0 .75 .25

T 0 0 1 0 .25 0

C 0 .75 0 1 0 0

G 1 0 0 0 0 .75

Motif Probability Model

Motif: GCTCAG

Page 5: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Modeling Motifs in Sequences

ATATCCGTA

AATCGAGAC

TCGATGTGT

CCACCTGCA

Assume:

Break into N sequences

Each sequence has one instance of motif embedded in random background

Variations of motif by point mutation, but not insertion or deletion

Page 6: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Modeling Motifs in Sequences

AT ATC CGTA

A ATC GAGAC

TCG ATG TGT

CC ACC TGCA

p j,ρ =

A 1 0 0

T 0 .75 0

C 0 .25 .75

G 0 0 .25

The “Alignment:” Starting position of motif in each sequence

The “Motif Probability Distribution:” Probability of each letter occurring at each motif position

rx = {x1,x2, x3 ...xN }

ex : r x = {3, 2, 4, 3}

p j,ρ

Page 7: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Scoring a Model

p(r x , p j ,ρ | S) =

p(S |r x ,p j ,ρ )p(

r x )p( p j ,ρ )

p(S )

p(r x , p j ,ρ | S) ⏐ → ⏐ log(

p(S |r x ,p j ,ρ )p( p j ,ρ )

p(S | pρ0 )

) + log(p(r x )) + log(p(S)) =

1N n j,ρ log(

ˆ p j ,ρ

pρ0 ) + constant

ρ ∈Σ

∑j=1

w

p(S |r x , p j ,ρ ) :

“Log-likelihood” score:

ATATCCGTA

AATCGAGAC

TCGATGTGT

CCACCTGCA

p1,T p2,A p3,T

p1,A p2,G p3,A

p1,A p2,T p3,G

p1,C p2,C p3,A

pC pC pG pT pA0 0 0 0 0

pA0

pA pA pT pC pG0 0 0 0 0 pC

0

pT pC pG0 0 0 pT pG pT

0 0 0

pC pC pT pG pC pA0 0 0 0 0 0

Page 8: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Example Models

A TAT CCGTA

AAT CGA GAC

TCGATG TGT

CC ACC TGCA

p j,ρ =

A 1 0 0

T 0 .75 0

C 0 .25 .75

G 0 0 .25

rx = {3, 2, 4, 3}

AT ATC CGTA

A ATC GAGAC

TCG ATG TGT

CC ACC TGCA

L(S |r x , p j ,ρ , p j

0) ≈ 3

p j,ρ =

A .25 .25 .25

T .5 0 .5

C .25 .25 .25

G 0 .5 0

rx = {2, 4, 7, 3}

L(S |r x , p j ,ρ , p j

0) ≈1.1

Page 9: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

The Gibbs SamplerWe want to find

pj, ρ

p( p j,ρ | S)that maximizes

pj, ρ

rx

L( p j,ρ ,r x | S)

p( p j,ρ | S) = p( p j,ρ∫ ,r x | S)d

r x

Page 10: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

The Gibbs Sampler

pj, ρ

p( p j,ρ ,r x | S)

pj, ρ

rx

pj, ρ

rx

pj, ρ

rx

pj, ρ

rx

pj, ρ

rx

pj, ρ

rx

Page 11: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

The Gibbs Sampler

Times visited

pj, ρ

Over time, the frequency distribution approaches

p( p j,ρ | S)

Page 12: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Biasing our search to these areas may discover the pj,ro values which maximize faster.

If we assume areas of local maximization contribute the most during “integration” to the local maximizations of

Optimization Technique

p( p j,ρ | S)

p( p j,ρ | S)

Page 13: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Multiple Gibbs Samplers

By combining results from Gibbs Samplers begun at random positions, find maximizing sooner

p( p j,ρ | S)

Page 14: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Replica Exchange/Parallel Tempering

“Low-sensitivity” samplers which “scout out area” periodically swap with “high-sensitivity” samplers good at focused searches if swap appears promising.

Page 15: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Controlling Sensitivity

˜ p (x i | p j,ρ ,S) = eβL(xi ,p j ,ρ |S )Adjust the relative probability of sampling an xi by adjusting a new parameter in distribution:

Small

β Large

β

Search breadth of space Focused search of region

Page 16: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Testing the Sensitivity

Running on randomly generated sequences to see motifs found, different sensitivity samplers converge to different scores.

Betas

21.9.1

Page 17: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Predicting Convergence Score

Measure of Similarity:

magnetization

m = 1N si

i=1

N

“Configuration Score:” energy

Ex: m=.5

E = −12 Jsis j

j=1j≠ i

N

∑i=1

N

∑m=.5

E=0

m=1

E=-6J

m=0

E=2J

m=0

E=2J

m=0

E=2J

p ≈ e−β 0

p ≈ eβ 6J

p ≈ e−β 2J

p ≈ e−β 2J

p ≈ e−β 2J

Page 18: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Alignment Analogue

m=.77

E=-5J

m=1

E=-9J

m=.77

E=-5J

m=.77

E=-5J

p ≈ eβ 9J

p ≈ eβ 5J

p ≈ eβ 5J

A:

B:

C:

Page 19: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Test Results

L < |alphabet|w

Page 20: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Test Results

L > |alphabet|w

Page 21: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Test Results

Page 22: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Test Results

Page 23: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Hidden Motifs: Gibbs SamplerBeta = .1 Beta = .5 Beta = .9

Beta = 1.3 Beta = 1.7 Beta = 2

W=5, l=500

Page 24: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta

Hidden Motifs: Replica Exchange

Betas

.9

.93

.961

.8

1.5

Page 25: Setting Up a Replica Exchange Approach to Motif Discovery in DNA Jeffrey Goett Advisor: Professor Sengupta