decomposition for reasoning with biological network gauvain bourgne, katsumi inoue isssb’11,...
TRANSCRIPT
Decomposition for Reasoning with
Biological Network
Gauvain Bourgne, Katsumi InoueISSSB’11, Shonan Village, November 13th -17th 2011
Automated Problem Decomposition 2
Motivation In bioinformatics, need to reason on huge amount
of data◦ Huge networks (e.g. metabolic pathways, signaling
pathways…)
On such problems, centralized methods◦ Long computation time◦ Memory overflow
Problem decomposition◦ Divide into smaller problems or steps to recompose a
global solution◦ Need for (1) an automated process to decompose and
(2) an algorithm to solve local problems and recompose global solution
/33
Automated Problem Decomposition 3
Example Problem (Krebs Cycle)
3
succinate
formaldehyde
creatinine
creatine
beta-alanine
2-oxe-glutarate
l-lysinel-2-aminoadipate
isocitrate
trans-aconitate
taurine
nmnd nmnahippurate
formate
sarcosine
l-as citrulline
ornithinearginine
urea
methylamine
tmao
lactate
glucose
acetate
acryloyl-coapyruvate
Fumaratefumarate
2.6.1.39 1.1.1.4
2
2.3.1.61
4.2.1.3
4.2.1.2
1.3.99.11.13.11.1
62.1.1.
12.1.1.
7 6.3.4.5
2.1.3.3
2.1.1.2
3.5.3.1
3.5.3.3
3.5.2.10
1.5.99.1
1.1.99.8
1.4.99.3
4.1.2.32
4.2.1.54
4.3.1.6
2.1.3.1
4.1.1.20
2.6.1.14
1.2.1.31
glycolisis
1.1.1.27
4.3.2.1
3.5.1.59
2.6.1.-
acetylcoa
2.3.3.1
1.2.4.1
6.2.1.1
citrate
/33
Automated Problem Decomposition 4
Example Problem (Krebs Cycle)
4
succinate
formaldehyde
creatinine
creatine
beta-alanine
2-oxe-glutarate
l-lysinel-2-aminoadipate
isocitrate
trans-aconitate
taurine
nmnd nmnahippurate
formate
sarcosine
l-as citrulline
ornithinearginine
urea
methylamine
tmao
lactate
glucose
acetate
acryloyl-coapyruvate
Fumaratefumarate
2.6.1.39 1.1.1.4
2
2.3.1.61
4.2.1.3
4.2.1.2
1.3.99.11.13.11.1
62.1.1.
12.1.1.
7 6.3.4.5
2.1.3.3
2.1.1.2
3.5.3.1
3.5.3.3
3.5.2.10
1.5.99.1
1.1.99.8
1.4.99.3
4.1.2.32
4.2.1.54
4.3.1.6
2.1.3.1
4.1.1.20
2.6.1.14
1.2.1.31
glycolisis
1.1.1.27
4.3.2.1
3.5.1.59
2.6.1.-
acetylcoa
2.3.3.1
1.2.4.1
6.2.1.1
citrate
Ag2
Ag0
Ag4
Ag1
Ag3
Ag5
4.2.1.2
1.1.1.424.1.1.20
2.3.3.14.3.1.6
2.1.3.1
2.1.3.33.5.3.1
1.5.99.1
1.3.99.1
/33
Automated Problem Decomposition 5
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 6
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 7
Logical representationMetabolic pathways: set of reactions Ri:
Ri: m1,m2,…,mp p1,p2,…,pn Such reactions can be represented as
◦ an activation rule ¬m1v¬m2v…v¬mp v Ri
◦ n production rules ¬Ri v p1
¬Ri v p2
…
¬Ri v pn
Clausal theory
/33
Automated Problem Decomposition 8
Problems(Conditional) accessibility problems
Sources (si), Conditional sources (ci), Targets (ti) Find which ti can be produced from si, possibly with the
addition of ci as a new source
◦ Find all consequences of the form ¬civ…v¬ckv tj
Extraction of sub-networksPathways completion (abduction)
◦ Find reactions (set of clauses)Hypothesis on state of reaction given
experiments
Consequence finding (with specific form) /33
Automated Problem Decomposition 9
Main reasoning taskConsequence Finding (CF) in clausal
theories◦ Input A clausal theory T A production field P=<L,Cond>
L is a list of literals Cond is a condition (maximal length of the
consequences, or number of occurrences of some literals)
◦Output All the consequences of T that are subsumption-
minimal and belongs to P (formed with literals of L respecting condition Cond).
Carc(T,P)
/33
Automated Problem Decomposition 10
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 11
Partition-based CFThe task
◦Consequence Finding (CF) in clausal theories Input
A set of clausal theory Ti such that UTi=T, and a set of reasoners ai associated with each partition
A production field P=<L,Cond> Output
Carc(T,P) Where
The output should be produced through local computations and interactions between reasoners (message exchange)
/33
Automated Problem Decomposition 12
Partition-based Consequence Finding
Generalization of Partition-based Theorem Proving [Amir & McIlraith, 2005]◦Based on Craig’s Interpolation Theorem:
If C entails D, then there is a formula F involving only symbols common to C et D such that C entails F and F entails D.
Principles Identify common symbols (communication
languages) Build a tree structure (cycle-cut) Forward relevant consequences from leaf to root
C DF
/33
Automated Problem Decomposition
Communication languages
Graph induced from the partitionProblem : eliminate cycles from it while
ensuring a proper labeling. Cycle-cut
While (G not acyclic) Take a minimal cycle
S=(i1,i2),(i2,i3),…,(ip,i1). Choose (i,j) in S s.t.
is minimal
For each (q,r)≠(i,j) in S, l(q,r)l(q,r)Ul(i,j)
Remove (i,j) from E
abc
bfg ade
acdf
a
ac
b
f ad
€
l(p,q)∪ l(i, j)(p,q )∈S(p,q )≠( i, j )
∑b
b
/3313
Automated Problem Decomposition 14
Forward Message-passing Algorithm(Sequential)
Preprocessing◦ Determine initial l(i,j)◦ Apply Cut-cycles◦ Determine Pi
Non-root agents ai (with parent aj): Pi=<LUl(i,j)>
Root ak: Pk=P
Consequence-Finding◦ From leaves to root Determine Cni=Carc(∑i,Pi)
Forward Cni
Carc
CarcCarc
Carc
/33
Automated Problem Decomposition 15
Parallel Variant
Carc
CarcCarc
Carc
Newcarc
Newcarc
Incremental computations:Newcarc(TUC,P)=Carc(TUC,P)\Carc(T,P)
/33
Automated Problem Decomposition 16
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 17
Decomposition of clausal theoriesGiven a Clausal Theory TFind a set of partitions Ti, such that
◦UTi=T
◦Reasoning is easier ie the application of partition-based algorithm to this decomposition is as efficient as possible. Minimize the size of the communication
languages Ensure that some simplification can be done
locally
Partitions should be cohesive and loosely coupled.
/33
c1: ¬b∨c∨e∨fc2: ¬a∨d∨ec3: ¬d∨g∨hc4: ¬e∨gc5: ¬g∨¬h∨i
c2
c1 c4
c3
c5
a d h
igec
f
b
c2
c1 c4
c3
c5
a d h
igec
f
b
c2
c1 c4
c3
c5
e
e
d
g,h
g
Graph representationClausal theory can be represented as
graph
Focus on common symbols
18Automated Problem Decomposition
/33
c2
c1 c4
c3
c5
1
1
1
2
1
Automated Problem Decomposition 19
ArchitectureInitial Theory
.sol file
Reduced graph
representation
Partitioned graph
Partitioned clausal theory.dcf file
Root
Solution
kmetis
Number of partitions
Partition-based CF
buildGraph
graph2dcf
Root choice heuristicChoose root with maximal average clause size
/33
Automated Problem Decomposition 20
Problem Decomposition
succinate
formaldehyde
creatinine
creatine
beta-alanine
2-oxe-glutarate
l-lysinel-2-aminoadipate
isocitrate
trans-aconitate
taurine
nmnd nmnahippurate
formate
sarcosine
l-as citrulline
ornithinearginine
urea
methylamine
tmao
lactate
glucose
acetate
acryloyl-coapyruvate
Fumaratefumarate
2.6.1.39 1.1.1.4
2
2.3.1.61
4.2.1.3
4.2.1.2
1.3.99.11.13.11.1
62.1.1.
12.1.1.
7 6.3.4.5
2.1.3.3
2.1.1.2
3.5.3.1
3.5.3.3
3.5.2.10
1.5.99.1
1.1.99.8
1.4.99.3
4.1.2.32
4.2.1.54
4.3.1.6
2.1.3.1
4.1.1.20
2.6.1.14
1.2.1.31
Glycolisis path
1.1.1.27
4.3.2.1
3.5.1.59
2.6.1.-
acetylcoa
2.3.3.1
1.2.4.1
6.2.1.1
citrate
ag1
ag3
ag2
ag5
ag4 ag0
/33
Automated Problem Decomposition 21
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 22
Benchmark Problems
Biological networksTPTP problems
◦Production field : Vocabulary of conjecture (+ removing conjecture) Full vocabulary with length limit
SAT problems◦Production field Based on frequency of literals
N% most/less frequent literals
◦Size Problems still not tractable as CF problems
Solving only a cohesive sub-problem (obtained by partition of the clause graph)
/33
Automated Problem Decomposition 23
Problems characteristics
/33
Automated Problem Decomposition 24
Results – Biological Networks
2 682 252 (3 321 857)
/33
Automated Problem Decomposition 25
Results – SAT problems
/33
Automated Problem Decomposition 26
Results – TPTP problems
/33
Automated Problem Decomposition 27
Results - summary
100 1000 10000 100000 1000000 10000000 100000000100
1000
10000
100000
1000000
10000000
100000000
Seq-bestPar-bestLine
/33
Automated Problem Decomposition 28
Results - summary
100 1000 10000 100000 1000000 10000000 100000000100
1000
10000
100000
1000000
10000000
100000000
Seq-heurPar-heurLine
/33
Automated Problem Decomposition
Results
For almost all problems, decomposition can reduce the number of resolve operations needed.Especially, it can solve some problems that
could not be solvedTime is no often improved
◦Due to communication time (parsing, and such)
Approached decomposition with metis: ok.Root choice heuristic: still insufficient,
though not bad for biological networks problems.
/3329
Automated Problem Decomposition 30
OverviewReasoning taskPartition-based algorithmAutomated decompositionExperimental evaluationConclusion
/33
Automated Problem Decomposition 31
ConclusionA sound and complete algorithm
combined with automated problem decomposition◦Can increase efficiency (nb of
operation) for almost all problems◦But, results dependent on the choice
of root
/33
Automated Problem Decomposition 32
Future worksPartition-based algorithm
◦Variant for Newcarc computations◦Common Theories for 1st order
representations◦Ordered partitions to break cycle
(without removing links)Decomposition
◦Directly from metabolic pathway◦Root choice heuristic Learning preference relation on root choice
◦Choosing the number of partition /33
Thank you for your attention
Any question ?
/3333Automated Problem Decomposition