![Page 1: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/1.jpg)
From high level goals to policies: a polynomial time algorithm for k-
maintainable goals
Chitta BaralArizona State university
(joint work with Marcus Bjareland, Thomas Eiter, Mutsumi Nakamura, and
Tran Son)
![Page 2: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/2.jpg)
Quick overview of my research
Knowledge Representation and ReasoningLanguage design; theoretical building blocks; implementation; applications.
Action, change and historiesDeveloping languages for representing actions, the structure of the world, and the effects of the actions on the world. Developing languages for expressing goals or Developing languages for expressing goals or directives. directives. Developing ways to achieve goalsDeveloping ways to achieve goalsFormulating various kinds of reasoning (e.g. prediction, planning, explanation, diagnosis, counterfactuals, etc.)
Application of the above to modeling cell behavior Prediction: (side) effect of drugsPlanning: Drug designExplanation: explaining unusual behavior; medical diagnosisOthers: hypothesis generation
![Page 3: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/3.jpg)
Motivation: Parameterized maintainability goals
Always f, also written as □ f- too strong for many kind of maintainability (eg. maintain the room clean)
Always Eventually f, also written as □ ◊ f. - Weak in the sense it does not give an estimate on when f will be made true.
- May not be achievable in presence of continuous interference by belligerent agents.
□ f ------------------ □ ◊k f -------------------------- □ ◊ f
□ ◊3 f is a shorthand for □ ( f V O f V OO f V OOO f )But if an external agent keeps interfering how is one supposed to guarantee □ ◊3 f .
![Page 4: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/4.jpg)
Motivation: a controller-agent transcript
Controller (to the agent/robot): Your goal is to maintain the room clean.
Robot/Agent: Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions?
Controller: You can only clean when the room is unoccupied.Controller: By ‘maintain’ I mean ALWAYS clean.Robot/Agent: I won’t be able to guarantee that. What if while the room
is occupied some one makes it dirty?Controller: Ok, I understand. How about
ALWAYS EVENTUALLLY clean.Controller’s Boss: ‘Eventually’ is too lenient. We can’t have the room
unclean for too long. We should put some bound.
![Page 5: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/5.jpg)
Controller-agent transcript (cont)
Controller: Sorry, Sir. I should have made it more precise.ALWAYS EVENTUALLY3 clean
Robot/Agent: Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used.
Controller: You have a good point. Let me clarify again.If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time.
Robot/Agent: I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.
![Page 6: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/6.jpg)
Formulating k-maintainability: a system
A system is a quadruple A = (S,A,Ф, poss), where– S is the set of system states;– A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv;
– Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions;
– poss : S → 2 A is a function that describes which actions are possible to take in which states.
![Page 7: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/7.jpg)
A system
s1s4
s3
s5s2
s6
s7
a1
a1
a2a3
a4
a5
S = {s1,s2,s3,s4,s5,s6,s7}
A = {a1, a2, a3,a4,a5}Ф : as shown in the pictureposs(s1) = {a1,a2,a3}poss(s4) = {a4}
![Page 8: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/8.jpg)
b
cd
hf
g
a
a’
e
a
a
a
S = {b,c,d,f,g,h}
A = {a, a’, e}
Aag = {a, a’}
Aenv = {e}
Ф : as shown in the pictureposs(b) = {a} when our policy dictates a to be executed at b.
![Page 9: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/9.jpg)
Controls and super-controls
Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions,
– a control policy for A w.r.t. Aag is a partial
function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined.
– a super-control policy for A w.r.t. Aag is a partial function
K : S → 2 Aag, such that K(s) is a subset of poss(s)
and K(s) ≠ { } whenever K(s) is defined.
![Page 10: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/10.jpg)
Reachable states and closure
Reachable states R(A,s): Given a system A = (S,A,Ф, poss) and a state s, R(A, s) (subset of S ) is the smallest set of states that satisfy the following conditions: (i) s is in R(A, s) ; and (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) .Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .
![Page 11: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/11.jpg)
b
cd
hf
g
a
a’
e
a
a
a
A = (S,A,Ф, poss)R(A,d) = {d,h}R(A,f) = {f, g, h}Closure({d,f}, A) = {d,f,g,h}
![Page 12: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/12.jpg)
Unfoldk(s,A,K):
An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s. Formally:
Let A = (S,A,Ф, poss) be a system, let s belong to S,
and let K be a control for A. Then Unfoldk(s,A,K) is the set of all sequences
σ = s0, s1, . . . , sl where l ≤ k and s0 = s, such that K (sj)
is defined for all j<l, sj +1 belongs to Ф (sj, K(sj)), and if
l<k, then K(sl) is undefined.
![Page 13: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/13.jpg)
b
cd
hf
g
a
a’
e
a
a
a
Consider policy K : Do action a in states b, c, and d
Unfold3(b,A,K) = { <b,c,d,h>, <b,g>}
Unfold3(c,A,K) = { <c,d,h> }
a
![Page 14: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/14.jpg)
Definition of k-maintainability: the parameters
1. a system A = (S,A,Ф, poss) ,
2. a set Aag ⊆ A of agent actions,
3. set of initial states S 4. a set of desired states E that we want to maintain,5. Maintainability parameter k.
6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and
7. a control K (mapping a relevant part of S to Aag) such that K (s) belongs to poss(s).
![Page 15: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/15.jpg)
Basic IdeaIgnoring interference:
From any state under consideration by following the control policy one should visit E in k steps.
Accounting for interference:Broaden the states under consideration from the initial states to all states that can be reached due to the control policy and the environment. (Use the notion of Closure.)When using Closure
take into account the control policy; ignore other agents actions besides the one dictated by the control policy.Also only consider exogenous actions in exo(s).
![Page 16: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/16.jpg)
Definition of k-maintainability possK,exo (s) is the set {K (s)} U exo(s).
AK,exo = (S,A,Ф, possK,exo)
Given a system A = (S,A,Ф, poss), a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset S of S with respect to subset E of S, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ
= s0, s1, . . . , sr in Unfoldk(s,A,K) with s0 = s, it holds that {s0, s1, . . . , sr } ∩ E ≠ { }.
![Page 17: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/17.jpg)
b
cd
hf
g
a
a’
e
a
a
a
Consider policy K: Do action a in states b, c, and d
poss(b) = {a,a’} possK,exo (b) = {a}
Closure({b,c},A)= {b,c,d,f,g,h}
Closure({b,c},AK,exo)= {b,c,d,h}
![Page 18: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/18.jpg)
b
cd
hf
g
a
a’
e
a
a
a
Goal: 3-maintainable policy for S={b} w.r.t. E={h}
Such a policy: Do a in b, c, and d
![Page 19: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/19.jpg)
b
cd
hf
g
a
a’
e
a
a
ae
Goal: 3-maintainable policy for S={b} w.r.t. E={h}
No such policy.
![Page 20: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/20.jpg)
Constructing k-maintainable control policies: pre-formulation attempts
Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols.Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing.Kaebling and Rosenschien 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.
![Page 21: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/21.jpg)
b
cd
hf
g
a
a’
e
a
a
a
Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack.
Backward Search: Should we include both d and f.
![Page 22: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/22.jpg)
Propositional Encoding of solutions
Input: An input I is a system A= (S, A,Φ, poss), set of goal states E S , set of initial states S S, a set Aag A, a function exo, and an integer k 0
Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is the answer that no such control exists.
AIM: Given an input I, we construct a SAT instance sat(I) in polynomial time such that sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, and that the satisfying assignments for sat(I) encode possible such controls.
![Page 23: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/23.jpg)
Propositional encoding: notation
si denotes thatthere is a path from state s to some state in E using only agent actions and at most i of them, to which we refer as “there is an a-path from s to E of length at most i,” and thatfrom each state s' reachable from s, there is an a-path from s' to E of length at most k.
![Page 24: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/24.jpg)
The encoding sat(I)(0) For all states s, and for all j, 0 j <k: sj sj+1
(1) For all s E: s0
(2) For all states s, t such that Φ(a,s) = t for some action a exo(s): sk tk
(3) For all states s not in E and all i, 1 i k:si t PS(s) ti-1 , where
PS(s) = {t S | a Aag poss(s): t = Φ(a,s) };
(4) For all initial states not in E: sk
(5) For all states s not in E: s0
![Page 25: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/25.jpg)
Constructing policies from the models of sat(I)
Let M be a model of Sat(I).CM = {s S | M╞ sk}
LM (s): the smallest index j such that M╞ sj (i.e., s0, s1 ,…, sj-1 are false and sj is true), which we call the level of s w.r.t. M.
K(s) is defined iff s CM \ E and
K(s) {a Aag | Φ(s,a) = t ,
t CM , LM (t) < LM (s) }
![Page 26: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/26.jpg)
Proposition Let I consist of a system A= (S, Aag, Φ, poss), where Φ is deterministic, a set Aag A, sets of states E S, and S S, an exogenous function exo, and a integer k. Then,
(i) S is k-maintainable w.r.t E iff sat(I) is satisfiable.(ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.
![Page 27: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/27.jpg)
Reverse Encodinga b is equivalent to a b is equivalent to ( b) a is equivalent tob a is equivalent tob’ a’ is equivalent toa’ b’
![Page 28: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/28.jpg)
Rearranging sat(I)(0) For all states s and for all j, 0 j <k:
sj sj+1 s’j s’j+1
(1) For all s E: s0 s’0
(2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk'
(3) For all state s not in E and all i, 1 i k:
si tPS(s) ti-1 , s’i ^tPS(s) t’i-1
where
PS(s) = {t S | a Aag poss(s): t = Φ(a,s) };
(4) For all initial states s not in E: sk s’k
(5) For all states not in E: s0 s’0
![Page 29: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/29.jpg)
b
cd
hf
g
a
a’
e
a
a
a
(6) b’0, c’0, d’0, f’0, g’0 (From 5)(7) g’1, g’2, g’3 (From 3)(8) b’1, c’1 (From 6 and 3)(9) f’3 (From 7 and 2)(10) f’2 (From 9 and 0)(11) f’1 (From 10 and 0)(12) b’2 (From 8, 11, and 3)Thus M = {g’3, g’2, g’1 , g’0, f’3, f’2, f’1 , f’0, b’2, b’1, b’0, c’1, c’0, d’0}LM(b) = 3LM(c) = 2LM(d) = 1
![Page 30: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/30.jpg)
Polynomial time generation of control policy and maximal control
policyComputing a model of a Horn theory is a well-known polynomial problem (Dowling & Gallier 84). Thus,Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time.Maximal Control
Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models.The least model is computable in linear time in the size of the encoding.This model not only leads to a k-maintainable control, but also leads to a maximal control, in the sense that the control is defined on a greatest set of states outside E among all possible k-maintainable controls for S' w.r.t. E such that S is a subset of S'.
![Page 31: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/31.jpg)
Dealing with non-deterministic transition functions
Notations:We say that there exists an a-path of length at most k 0 from a state s to a set of states S' , if either s S', or s S' , k > 0 and there is some action a Aag poss(s) such that for every t Φ(s,a) there exists an a-path of length at most k-1 from t to S'.s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a.
The encoding sat'(I) has again groups (0)-(5) of clauses as follows:(0), (1), (4) and (5) are the same as in sat(I).(2) For any state s and t such that t Φ(a,s) for some action
a exo(s): sk tk
![Page 32: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/32.jpg)
Dealing with non-deterministic transition functions (cont.)
(3) For every state s not in E and for all i, 1 i k :
(3.1) si (a Aag poss(s) ) s_ai;
(3.2) for every a Aag poss(s) and t Φ(s,a) : s_ai ti-1;
(3.3) for every a Aag poss(s) if i < k: s_ai s_ai+1 ;
![Page 33: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/33.jpg)
A direct algorithmInitialization
For all states s not in E make s’0 true.For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true.For all states s, if agent action a is not executable in s then make s_a’0 … s_a’k true.
Repeat until no change or until s’k is true for some initial state s.
If s’i is true then make s’i-1 true. If s_a’i is true then make s_a’i-1 . true.If t Φ(a,s) for some exogenous action a and t’k is true then make s’k true.For any state s not in E
If t Φ(a,s) for some agent action a and t’i-1 is true then make s_a’i true.If for all agents actions a that is executable in s we have s_a’i then make s’i true.
![Page 34: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/34.jpg)
A direct algorithm (cont.)If for some initial state s, s’k is true then the system is not k-maintainable, else construct super-control as follows:
For states s in E, K(s) is undefined and for other states K(s) = { a : s_a’k is not true}
![Page 35: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/35.jpg)
Direct algorithm using counters
Idea: c[s] = i means s’0 … s’i and c[s_a] = i means s_a’0 … s_a’i
InitializationFor all states s not in E make s’0 true. c[s]:= 0.For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k.For all states s, if agent action a is not executable in s then make s_a’0 … s_a’k true. c[s_a] := k.
The other steps are similar.The idea can then be extended to actions with durations (or costs).
![Page 36: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/36.jpg)
Computational Complexityk-maintainability is PTIME-complete (under log-space reduction). PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous actionk-maintainability is EXPTIME-complete when we have a compact representation. EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action
![Page 37: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/37.jpg)
ConclusionHigh level goal specification is important.Certain important goal specification notions can not be expressed using existing goal representation languages.k-maintainability is an important notion.
finite-maintainability is reinvention of Dijkstra's notion of self-stabilization.
There is a big research community of self-stabilization in distributed control and fault tolerance.But they have not much focused on automatic generation of control (protocol, in their parlance)They have focused more on proving correctness of hand written protocol
Most specifications over infinite trajectories would be better of with k-maintainability like notions as part of the specification.
Role 1 of k: length of the window of opportunityRole 2 of k: bound within which maintenance is guaranteed
![Page 38: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/38.jpg)
Conclusion (cont.)Sat encoding to Horn logic program encoding – an interesting and novel approach to design polynomial algorithms
One often does not think in terms of negative propositions.
![Page 39: From high level goals to policies: a polynomial time algorithm for k-maintainable goals](https://reader035.vdocuments.site/reader035/viewer/2022070403/56813a0f550346895da1e5c8/html5/thumbnails/39.jpg)
THANK YOU!