Timed Gamesand
Stochastic Priced Timed GamesSynthesis & Machine Learning
TIGA
STRATEGO
Kim G. Larsen
– Aalborg University
DENMARK
Overview
Timed Automata Decidability (regions) Symbolic Verification (zones)
Priced Timed Automata Decidability (priced regions) Symbolic Verification (priced zones)
Stochastic Timed Automata Stochastic Semantics Statistical Model Checking Stochastic Hybrid Automata
Timed Games & Stochastic Priced Timed Games Symbolic Synthesis Reinforcement Learning Applications
TU Graz, May 2017 Kim Larsen [2]
TRON
CLASSIC
TIGA
CORA
ECDAR
SMC
Optimization
Synthesis
Component
Testing
PerformanceAnalysis
Verification
STRATEGOOptimal Synthesis
1995
2001
2005
2011
2014
2010
2004
Timed Gamesand
Stochastic Priced Timed GamesSynthesis & Machine Learning
TIGA
STRATEGO
Kim G. Larsen
– Aalborg University
DENMARK
Timed Automata & Model Checking
TU Graz, May 2017 Kim Larsen [4]
State (L1, x=0.81)Transitions
(L1 , x=0.81) - 2.1 ->
(L1 , x=2.91)->
(goal , x=2.91)
Ehi goal ?
Ahi goal ?
A[ ] : L4 ?
Timed Game
TU Graz, May 2017 Kim Larsen [5]
Timed Game & Synthesis
TU Graz, May 2017
x<=1
x<=1
Kim Larsen [6]
Decidability of Timed Games
TU Graz, May 2017 Kim Larsen [7]
Untimed and Timed Games
Reachability / Safety Games
Uncontrollable
Controllable
1
2
3
4
x>1
x·1
x<1
x:=0
x<1
x·1
x¸2
8TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Memoryless Strategy:F : Q Ec
Winning Run:States(r) Å G Ø
Winning Strategy:Runs(F) WinRuns
9TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Memoryless Strategy:F : Q Ec
Winning Run r:States(r) Å G Ø
Winning Strategy:Runs(F) WinRuns
10TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Memoryless Strategy:F : Q Ec
Winning Run r :States(r) Å G Ø
Winning Strategy:Runs(F) WinRuns
11TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
12TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
13TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
14TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
15TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
16TU Graz, May 2017
Untimed Games
Uncontrollable
Controllable
Backwards Fixed-Point Computation
cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}
p(X) = cPred(X) \ uPred(XC) ]
Theorem:The set of winning states is obtained as the least fixpoint of the function:
X a p(X) [ Goal
17TU Graz, May 2017
Computing Winning States
TU Graz, May 2017 Kim Larsen [18]
Reachability GamesBackwards Fixed-Point Computation
Theorem:
The set of winning states is obtained as the least fixpointof the function: X a p(X) [ Goal
cPred(X) = { q2Q | 9 q’2 X. q c q’}
uPred(X) = { q2Q | 9 q’2 X. q u q’}
Predt(X,Y) = { q2Q | 9 t. qt2X and 8 s·t. qs2YC }
p(X) = Predt[ X [ cPred(X) , uPred(XC) ]
Definitions
X
YPredt(X,Y)
Kim Larsen [19]TU Graz, May 2017
Symbolic On-the-fly Algorithms for Timed Games [CDF+05, BCD+07]
symbolic version of on-the-fly MC algorithmfor modal mu-calculus
Liu & Smolka 98
Kim Larsen [20]TU Graz, May 2017
UPPAAL Tiga [CDF+05, BCD+07]
Reachability properties: control: A[ p U q ] until
control: Ahi q control: A[ true U q ]
Safety properties: control: A[ p W q ] weak until
control: A[] p control: A[ p W false ]
Time-optimality : control_t*(u,g): A[ p U q ]
u is an upper-bound to prune the search
g is the time to the goal from the current state
TU Graz, May 2017 Kim Larsen [21]
UPPAAL Tiga
TU Graz, May 2017 Kim Larsen [22]
DEMO
Model Checking (ex Train Gate)
TU Graz, May 2017 Kim Larsen [23]
: Never two trains at
the crossing at the
same time
Environment
Controller
Synthesis (ex Train Gate)
TU Graz, May 2017 Kim Larsen [24]
: Never two trains at
the crossing at the
same time
Environment
Controller
?
Timed Games
TU Graz, May 2017 Kim Larsen [25]
: Never two trains at
the crossing at the
same time
Controllable Uncontrollable
Find strategy for controllable
actions st behaviour satisfies
Controller
Environment
TU Graz, May 2017
Synthesis Demo
Kim Larsen [26]
A Buggy Brick Sorting Program
16MCD 2001, Twente Kim G. Larsen
UCb
First UPPAAL model
Sorting of Lego Boxes
Conveyer Belt
Exercise: Design Controller so that only yellew boxes are being pushed out
Boxes
Piston
Black
Yellow
9 18 81 90
99
BlckYel
remove
eject
Controller
Ken Tindell
MAIN PUSH
Conveyer Belt
eject
27TU Graz, May 2017
Brick Sorting
Piston
Generic Plate
Controller
28TU Graz, May 2017
Brick Sorting
Piston
Generic Plate
Controller
29TU Graz, May 2017
30
Problem: avoid having the plates falling down
The Chinese Juggling Problem
thanks to Oded Maler
Kim G Larsen
Balancing Plates / Timed Automata
A Plate
The Joggler
E :(Plate1.Bang or Plate2.Bang or …)Kim G Larsen 31China Summer 2009
Balancing Plates / Time Uncertainty
Strategy
BDD/CDD
Kim G Larsen 32China Summer 2009
Production Cell Overview
Realistic case-study describedin several formalisms(1994 and later).
Objective: stampmetal plates in press.
feed belt, two-armedrobot, press, anddeposit belt.
Kim Larsen [33]TU Graz, May 2017
Production Cell in UPPAAL Tiga
TU Graz, May 2017 Kim Larsen [34]
Experimental Results
[CDF+05]
[BCD+07]
Kim Larsen [35]TU Graz, May 2017
Plastic Injection Molding Machine
Robust and optimal control
Tool Chain
Synthesis: UPPAAL TIGA
Verification: PHAVer
Performance: SIMULINK
40% improvement of existing solutions..
Quasiomodo
Kim Larsen [36]TU Graz, May 2017
[CJL+09]
Oil Pump Control Problem
R1: stay within safe interval [4.9,25.1]
R2: minimize average/overall oil volume
Kim Larsen [37]TU Graz, May 2017
Quasiomodo
The Machine (consumption)
Infinite cyclic demand to be satisfied by our control strategy.
P: latency 2 s between state change of pump
F: noise 0.1 l/s
Kim Larsen [38]TU Graz, May 2017
Quasiomodo
Hybrid Game Model
TU Graz, May 2017 Kim Larsen [39]
Abstract Game Model
UPPAAL Tiga offers games of perfect information
Abstract game model such that states only contain information about: Volume of oil at the beginning of cycle
The ideal volume as predicted by the consumption cycle
Current time within the cycle
State of the Pump (on/off)
Discrete model
DV, V_rate
V_acctime
Kim Larsen [40]TU Graz, May 2017
Quasiomodo
Machine (uncontrollable)
Checks whether V under noise gets
outside [Vmin+0.1,Vmax-0.1]
Kim Larsen [41]TU Graz, May 2017
Quasiomodo
Pump (controllable)
Every 1 (one) seconds
Kim Larsen [42]TU Graz, May 2017
Quasiomodo
TU Graz, May 2017
Global Approach
Find some interval I1=[V1,V2] [4.9,25.1] s.t
I1 is m-stable i.e. from any V0 in I1 there is strategy stwhatever fluctuation volume is always within [5,25] and at the end within I2=[V1+m,V1-m]
I1 is optimal among all m-stable intervals.
0
25
5
10
15
20
I1 I2
0 s 20 s
Page 43
Tool Chain
Strategy Synthesis TIGA
Verification PHAVER
Performance Evaluation
SIMULINK
GuaranteedCorrectnessRobustness
with40% Improvement
Quasiomodo
Kim Larsen [44]TU Graz, May 2017
Synthesis of Home Automation
TU Graz, May 2017 Kim Larsen [45]
What else ?
Timed Games w Partial Observability Action-based Observation: undecidable [BDMP03] Finite-observation of states: decidable [CDL+07]
Priced Timed Games: Acyclic, cost non-zeno: decidable [LTMM02] [BCFL04] 1 clock: decidable [BLMR06] >2 clocks: undecidable [BBR05, BBM06] 2 clocks: open
Energy Games: Several Open Problems Exponential Observers
Climate Controller inPig Stables [JRLD07]
CHESS Way [Quasimodo@ESWEEK]
TU Graz, May 2017 Kim Larsen [46]
Timed Gamesand
Stochastic Priced Timed GamesSynthesis & Machine Learning
TIGA
STRATEGO
Kim G. Larsen
– Aalborg University
DENMARK
Going to Uppsala – in 1 hour
TU Graz, May 2017 Kim Larsen [48]
0.9
0.9
0.1
0.1
U[42,45]
U[0,35]
U[0,20]
U[0,140]
Optimal WC Strategy(2-player) Take bikeWC=45
Optimal Expected Strategy(1½ player)Take carE = 16WC = 140
Optimal Expected Strategyguaranteeing WC<=60
?????
DEMO
GTimed Game
σStrategy
PStochastic
PricedTimed Game
P|σ
φ
synthesis
abstraction
σ°optimizedStrategy
G|σTimed Automata
P|σ°Stochastic Priced Timed Automata
minE(cost)
maxE(gain)
Uppaal TIGAstrategy NS = control: A<> goalstrategy NS = control: A[] safe
Statistical Learning
strategy DS = minE (cost) [<=10]: <> done under NSstrategy DS = maxE (gain) [<=10]: <> done under NS
UppaalE<> error under NSA[] safe under NS
Uppaal SMCsimulate 5 [<=10]{e1, e2} under SS Pr[<=10](<> error) under SS E[<=10;100](max: cost) under SS
Timed Games
TU Graz, May 2017 Kim Larsen [51]
Strategy:
Memoryless, deterministic, most permissive.
Uncontrol-
lable
Controllable
TIGA
Run
𝜋 = INIT, 𝑥 = 050.1
՜r(CHOICE, 𝑥 = 0)
2.4՜b(B, 𝑥 = 0)
20.3՜d(END, 𝑥 = 20.3)
Total time = 50.1 + 2.4 + 20.3 = 72.8
Timed Games –Time Bounded Reachability
TU Graz, May 2017 Kim Larsen [52]
Objective: 𝐴⟨⟩(END ∧ time≤ 210)
Deterministic, memoryless strategy:
100 200
x w w
𝝀 𝝀
time
100 200
time
x w w
𝝀
9070
𝝀
a
𝝀
a b
Most permissive, memoryless strategy
100
100
time
time
Priced Timed Games
TU Graz, May 2017 Kim Larsen [53]
• Cost optimal strategy take b immediately
WC= 280
”CORA”
COST
Total 𝑐𝑜𝑠𝑡 = 𝟎 + 𝟒. 𝟖 + 𝟔𝟎. 𝟗 = 𝟔𝟓. 𝟕
Priced Run
𝜋 = Init, 𝑥 = 050.1
𝟎՜rCHOICE, 𝑥 = 0
2.4
4.8՜b
B, 𝑥 = 020.3
𝟔𝟎.𝟗՜d(END, 𝑥 = 20.3)
Priced Timed MDP
TU Graz, May 2017 Kim Larsen [54]
• Cost optimal strategy take b immediately
WC= 280
Priced Timed MDP
Optimal expected cost str
take a immediately
expectation = 160
UNIFORM[0,100]
Controllable
”SMC”
COST
Priced Timed MDP
TU Graz, May 2017
Kim Larsen [55]
Cost optimal strategy
take b immediately
overall = 280
Priced Timed MDP
Optimal expected cost str
take a immediately
expectation = 160
Minimal Expected Cost while
guaranteeing END is reached
within time 210:
Strat.: t>90 (100,w)
t>70 (0,a)
t<70 (0,b)
= 204
UNIFORM[0,100]
Controllable
”SMC”
COST
Stochastic Strategies for Learning!
TU Graz, May 2017 Kim Larsen [56]
Objective: 𝐴⟨⟩(END ∧ time≤ 210)
Most permissive, memoryless strategy:
100
100 200
x
𝝀
9070
𝝀
a
𝝀
a bCost optimal deterministic
sub-strategy !
100
200
x w w
𝝀
9070
𝝀
a
𝝀
a b
100
Stochastic Strategies
𝝀 )
time
time
w w
Reinforcement Learning
TU Graz, May 2017 Kim Larsen [57]
Time Bounded Reachability(G,T)
TIGA
SMC
SMC
Learned Strategies
TU Graz, May 2017 Kim Larsen [58]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Learned Strategies
TU Graz, May 2017 Kim Larsen [59]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Learned Strategies
TU Graz, May 2017 Kim Larsen [60]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Learned Strategies
TU Graz, May 2017 Kim Larsen [61]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Learned Strategies
TU Graz, May 2017 Kim Larsen [62]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Learned Strategies
TU Graz, May 2017 Kim Larsen [63]
More plots of runs according to strategies learne. 𝝀
a
b
Covariance Matrices
Strategies – Representation
TU Graz, May 2017 Kim Larsen [64]
Nondeterministic Strategies 𝜎𝑛(ℓ,𝑣)
⊆ Σ𝑐 ∪ 𝜆
Stochastic Strategies
Covariance Matrices
Splitting
Logistic Regression
𝜇𝑠(ℓ,𝑣)
∶ Σ𝑐 ∪ 𝜆 ՜ [0,1]
𝑅ℓ𝑅ℓ
Stochastic Timed Game
TU Graz, May 2017 Kim Larsen [65]
DEMO
Safe & Adaptive Cruice Control
TU Graz, May 2017 Kim Larsen [67]
Q1: Find the most permissive strategy ensuring safety.
Q2:Find the optimal sub-strategy that will allow Ego to go as close as possible.
EGO FRONT
Two Player Game (simplified)
TU Graz, May 2017 Kim Larsen [68]
Discretization
TU Graz, May 2017 Kim Larsen [69]
Discrete
Continuous
Front (complete)
TU Graz, May 2017 Kim Larsen [70]
No Strategy
TU Graz, May 2017 Kim Larsen [71]
Safety Strategy
TU Graz, May 2017 Kim Larsen [72]
Safety Strategy
TU Graz, May 2017 Kim Larsen [73]
Safe and Optimal Strategy
TU Graz, May 2017 Kim Larsen [74]
Safe and Optimal Strategy
TU Graz, May 2017 Kim Larsen [75]
Optimal and Safe Strategy
TU Graz, May 2017 Kim Larsen [76]
Safety Strategy (Code)
TU Graz, May 2017 Kim Larsen [77]
Synthesis of Climate Controllers
TU Graz, May 2017 Kim Larsen [78]
TACAS16
Synthesis of Climate Controllers
TU Graz, May 2017 Kim Larsen [79]
TACAS16
Synthesis of Climate Controllers
TU Graz, May 2017 Kim Larsen [80]
TACAS16
From Verification to Synthesisand Optimization
TU Graz, May 2017 Kim Larsen [81]
1 ½
2 1½
Skov
HYDAC
SELUXIT
Zone-based climatecontrol pig-stables
Profit-optimal, minimal-wear and energy-awareschedules for satelittes
Personalized light control in homeautomation
Energy- and comfort-optimal floor heating
Intelligent Traffic Control
Safe and optimal car maneuvers
Mathias G Sørensen
TU Graz, May 2017
GOMSpace
Skov
More Practical Synthesis …
82
LASSO
Learning, Analysis,
SynthesiS and Optimization
of Cyber-Physical Systems
Ongoing Work
Verification of learned strategy (zonification) Full correctness of hybrid control obtained from
discretization. Combination with Invariance Analysis for Switched Controller
Synthesis; Metrics; Verification using SpaceEX;
From optimal strategy of bounded horizon to optimal infinite strategy.
Strategies: complexity (space and time) and permissiveness
On-line synthesis may be too slow for some application
Satellites > Floor-heating > Traffic > Power Electronics Learn Neural Network representation of optimal strategy
TU Graz, May 2017 Kim Larsen [83]
LASSOLearning, Analysis, SynthesiS and Optimization
of Cyber-Physical Systems
1
𝜇1 𝜇𝑛
Safety Constraints
Perf. Measures
Model of
Physical Comp.Model of
Cyber Comp.
Unknown
Known
Learning
Analysis
Synthesize
Optimize
Fig 1. The LASSO Framework
TU Graz, May 2017
NEXT UPPAAL BRANCHES
PhD/visitors/PostDocPosition available
Contact: [email protected]
Thank You !
TU Graz, May 2017 Kim Larsen [85]
Exercises (from yesterday)
Exercise 28: Jobshop Scheduling
Look at
https://www.dropbox.com/sh/96tvd7qklf1gcyh/AADFwVJk97L9qtJLkf0VUgtOa?dl=0
In UPPAAL SMC
Coffee Machine
Hammer Nail
Hybrid
… and more
TU Graz, 2017 Kim Larsen [86]
Exercises
Download UPPAAL STRATEGO 4.1.20-3 or 4.1.20-4
http://people.cs.aau.dk/~marius/stratego
Experiment with Newspaper
Traffic Uppsala
Cruice
Small Car
See Dropbox
TU Graz, May 2017 Kim Larsen [87]