Download - Timed Games STRATEGO and Stochastic Priced Timed Gamespeople.cs.aau.dk › ~kgl › GRAZ17 › GRAZ5.pdf · 2017-05-30 · 1 =[V 1,V 2] [4.9,25.1] s.t I 1 is m-stable i.e. from any

Timed Gamesand

Stochastic Priced Timed GamesSynthesis & Machine Learning

TIGA

STRATEGO

Kim G. Larsen

– Aalborg University

DENMARK

Overview

Timed Automata Decidability (regions) Symbolic Verification (zones)

Priced Timed Automata Decidability (priced regions) Symbolic Verification (priced zones)

Stochastic Timed Automata Stochastic Semantics Statistical Model Checking Stochastic Hybrid Automata

Timed Games & Stochastic Priced Timed Games Symbolic Synthesis Reinforcement Learning Applications

TU Graz, May 2017 Kim Larsen [2]

TRON

CLASSIC

TIGA

CORA

ECDAR

SMC

Optimization

Synthesis

Component

Testing

PerformanceAnalysis

Verification

STRATEGOOptimal Synthesis

1995

2001

2005

2011

2014

2010

2004

Timed Gamesand


TIGA

STRATEGO

Kim G. Larsen


DENMARK

Timed Automata & Model Checking


State (L1, x=0.81)Transitions

(L1 , x=0.81) - 2.1 ->

(L1 , x=2.91)->

(goal , x=2.91)

Ehi goal ?

Ahi goal ?

A[ ] : L4 ?

Timed Game


Timed Game & Synthesis

TU Graz, May 2017

x<=1

x<=1

Kim Larsen [6]

Decidability of Timed Games


Untimed and Timed Games

Reachability / Safety Games

Uncontrollable

Controllable

1

2

3

4

x>1

x·1

x<1

x:=0

x<1

x·1

x¸2

8TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Memoryless Strategy:F : Q Ec

Winning Run:States(r) Å G Ø

Winning Strategy:Runs(F) WinRuns

9TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable


Winning Run r:States(r) Å G Ø


10TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable


Winning Run r :States(r) Å G Ø


11TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

12TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable





X a p(X) [ Goal

13TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable





X a p(X) [ Goal

14TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable





X a p(X) [ Goal

15TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable





X a p(X) [ Goal

16TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable





X a p(X) [ Goal

17TU Graz, May 2017

Computing Winning States


Reachability GamesBackwards Fixed-Point Computation

Theorem:

The set of winning states is obtained as the least fixpointof the function: X a p(X) [ Goal

cPred(X) = { q2Q | 9 q’2 X. q c q’}

uPred(X) = { q2Q | 9 q’2 X. q u q’}

Predt(X,Y) = { q2Q | 9 t. qt2X and 8 s·t. qs2YC }

p(X) = Predt[ X [ cPred(X) , uPred(XC) ]

Definitions

X

YPredt(X,Y)

Kim Larsen [19]TU Graz, May 2017

Symbolic On-the-fly Algorithms for Timed Games [CDF+05, BCD+07]

symbolic version of on-the-fly MC algorithmfor modal mu-calculus

Liu & Smolka 98


UPPAAL Tiga [CDF+05, BCD+07]

Reachability properties: control: A[ p U q ] until

control: Ahi q control: A[ true U q ]

Safety properties: control: A[ p W q ] weak until

control: A[] p control: A[ p W false ]

Time-optimality : control_t*(u,g): A[ p U q ]

u is an upper-bound to prune the search

g is the time to the goal from the current state


UPPAAL Tiga


DEMO

Model Checking (ex Train Gate)


: Never two trains at

the crossing at the

same time

Environment

Controller

Synthesis (ex Train Gate)



the crossing at the

same time

Environment

Controller

?

Timed Games



the crossing at the

same time

Controllable Uncontrollable

Find strategy for controllable

actions st behaviour satisfies

Controller

Environment

TU Graz, May 2017

Synthesis Demo

Kim Larsen [26]

A Buggy Brick Sorting Program

16MCD 2001, Twente Kim G. Larsen

UCb

First UPPAAL model

Sorting of Lego Boxes

Conveyer Belt

Exercise: Design Controller so that only yellew boxes are being pushed out

Boxes

Piston

Black

Yellow

9 18 81 90

99

BlckYel

remove

eject

Controller

Ken Tindell

MAIN PUSH

Conveyer Belt

eject

27TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

28TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

29TU Graz, May 2017

30

Problem: avoid having the plates falling down

The Chinese Juggling Problem

thanks to Oded Maler

Kim G Larsen

Balancing Plates / Timed Automata

A Plate

The Joggler

E :(Plate1.Bang or Plate2.Bang or …)Kim G Larsen 31China Summer 2009

Balancing Plates / Time Uncertainty

Strategy

BDD/CDD

Kim G Larsen 32China Summer 2009

Production Cell Overview

Realistic case-study describedin several formalisms(1994 and later).

Objective: stampmetal plates in press.

feed belt, two-armedrobot, press, anddeposit belt.


Production Cell in UPPAAL Tiga


Experimental Results

[CDF+05]

[BCD+07]


Plastic Injection Molding Machine

Robust and optimal control

Tool Chain

Synthesis: UPPAAL TIGA

Verification: PHAVer

Performance: SIMULINK

40% improvement of existing solutions..

Quasiomodo


[CJL+09]

Oil Pump Control Problem

R1: stay within safe interval [4.9,25.1]

R2: minimize average/overall oil volume


Quasiomodo

The Machine (consumption)

Infinite cyclic demand to be satisfied by our control strategy.

P: latency 2 s between state change of pump

F: noise 0.1 l/s


Quasiomodo

Hybrid Game Model


Abstract Game Model

UPPAAL Tiga offers games of perfect information

Abstract game model such that states only contain information about: Volume of oil at the beginning of cycle

The ideal volume as predicted by the consumption cycle

Current time within the cycle

State of the Pump (on/off)

Discrete model

DV, V_rate

V_acctime


Quasiomodo

Machine (uncontrollable)

Checks whether V under noise gets

outside [Vmin+0.1,Vmax-0.1]


Quasiomodo

Pump (controllable)

Every 1 (one) seconds


Quasiomodo

TU Graz, May 2017

Global Approach

Find some interval I1=[V1,V2] [4.9,25.1] s.t

I1 is m-stable i.e. from any V0 in I1 there is strategy stwhatever fluctuation volume is always within [5,25] and at the end within I2=[V1+m,V1-m]

I1 is optimal among all m-stable intervals.

0

25

5

10

15

20

I1 I2

0 s 20 s

Tool Chain

Strategy Synthesis TIGA

Verification PHAVER

Performance Evaluation

SIMULINK

GuaranteedCorrectnessRobustness

with40% Improvement

Quasiomodo


Synthesis of Home Automation


What else ?

Timed Games w Partial Observability Action-based Observation: undecidable [BDMP03] Finite-observation of states: decidable [CDL+07]

Priced Timed Games: Acyclic, cost non-zeno: decidable [LTMM02] [BCFL04] 1 clock: decidable [BLMR06] >2 clocks: undecidable [BBR05, BBM06] 2 clocks: open

Energy Games: Several Open Problems Exponential Observers

Climate Controller inPig Stables [JRLD07]

CHESS Way [Quasimodo@ESWEEK]


Timed Gamesand


TIGA

STRATEGO

Kim G. Larsen


DENMARK

Going to Uppsala – in 1 hour


0.9

0.9

0.1

0.1

U[42,45]

U[0,35]

U[0,20]

U[0,140]

Optimal WC Strategy(2-player) Take bikeWC=45

Optimal Expected Strategy(1½ player)Take carE = 16WC = 140

Optimal Expected Strategyguaranteeing WC<=60

?????

GTimed Game

σStrategy

PStochastic

PricedTimed Game

P|σ

φ

synthesis

abstraction

σ°optimizedStrategy

G|σTimed Automata

P|σ°Stochastic Priced Timed Automata

minE(cost)

maxE(gain)

Uppaal TIGAstrategy NS = control: A<> goalstrategy NS = control: A[] safe

Statistical Learning

strategy DS = minE (cost) [<=10]: <> done under NSstrategy DS = maxE (gain) [<=10]: <> done under NS

UppaalE<> error under NSA[] safe under NS

Uppaal SMCsimulate 5 [<=10]{e1, e2} under SS Pr[<=10](<> error) under SS E[<=10;100](max: cost) under SS

Timed Games


Strategy:

Memoryless, deterministic, most permissive.

Uncontrol-

lable

Controllable

TIGA

Run

𝜋 = INIT, 𝑥 = 050.1

՜r(CHOICE, 𝑥 = 0)

2.4՜b(B, 𝑥 = 0)

20.3՜d(END, 𝑥 = 20.3)

Total time = 50.1 + 2.4 + 20.3 = 72.8

Timed Games –Time Bounded Reachability


Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Deterministic, memoryless strategy:

100 200

x w w

𝝀 𝝀

time

100 200

time

x w w

𝝀

9070

𝝀

a

𝝀

a b

Most permissive, memoryless strategy

100

100

time

time

Priced Timed Games


• Cost optimal strategy take b immediately

WC= 280

”CORA”

COST

Total 𝑐𝑜𝑠𝑡 = 𝟎 + 𝟒. 𝟖 + 𝟔𝟎. 𝟗 = 𝟔𝟓. 𝟕

Priced Run

𝜋 = Init, 𝑥 = 050.1

𝟎՜rCHOICE, 𝑥 = 0

2.4

4.8՜b

B, 𝑥 = 020.3

𝟔𝟎.𝟗՜d(END, 𝑥 = 20.3)

Priced Timed MDP


• Cost optimal strategy take b immediately

WC= 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

UNIFORM[0,100]

Controllable

”SMC”

COST

Priced Timed MDP

TU Graz, May 2017

Kim Larsen [55]

Cost optimal strategy

take b immediately

overall = 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

Minimal Expected Cost while

guaranteeing END is reached

within time 210:

Strat.: t>90 (100,w)

t>70 (0,a)

t<70 (0,b)

= 204

UNIFORM[0,100]

Controllable

”SMC”

COST

Stochastic Strategies for Learning!


Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Most permissive, memoryless strategy:

100

100 200

x

𝝀

9070

𝝀

a

𝝀

a bCost optimal deterministic

sub-strategy !

100

200

x w w

𝝀

9070

𝝀

a

𝝀

a b

100

Stochastic Strategies

𝝀 )

time

time

w w

Reinforcement Learning


Time Bounded Reachability(G,T)

TIGA

SMC

SMC

Learned Strategies


More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies



a

b

Covariance Matrices

Strategies – Representation


Nondeterministic Strategies 𝜎𝑛(ℓ,𝑣)

⊆ Σ𝑐 ∪ 𝜆

Stochastic Strategies

Covariance Matrices

Splitting

Logistic Regression

𝜇𝑠(ℓ,𝑣)

∶ Σ𝑐 ∪ 𝜆 ՜ [0,1]

𝑅ℓ𝑅ℓ

Stochastic Timed Game


Safe & Adaptive Cruice Control


Q1: Find the most permissive strategy ensuring safety.

Q2:Find the optimal sub-strategy that will allow Ego to go as close as possible.

EGO FRONT

Two Player Game (simplified)


Discretization


Discrete

Continuous

Front (complete)


No Strategy


Safety Strategy


Safe and Optimal Strategy


Optimal and Safe Strategy


Safety Strategy (Code)


Synthesis of Climate Controllers


TACAS16



TACAS16

From Verification to Synthesisand Optimization


1 ½

2 1½

Skov

HYDAC

SELUXIT

Zone-based climatecontrol pig-stables

Profit-optimal, minimal-wear and energy-awareschedules for satelittes

Personalized light control in homeautomation

Energy- and comfort-optimal floor heating

Intelligent Traffic Control

Safe and optimal car maneuvers

Mathias G Sørensen

TU Graz, May 2017

GOMSpace

Skov

More Practical Synthesis …

82

LASSO

Learning, Analysis,

SynthesiS and Optimization

of Cyber-Physical Systems

Ongoing Work

Verification of learned strategy (zonification) Full correctness of hybrid control obtained from

discretization. Combination with Invariance Analysis for Switched Controller

Synthesis; Metrics; Verification using SpaceEX;

From optimal strategy of bounded horizon to optimal infinite strategy.

Strategies: complexity (space and time) and permissiveness

On-line synthesis may be too slow for some application

Satellites > Floor-heating > Traffic > Power Electronics Learn Neural Network representation of optimal strategy


LASSOLearning, Analysis, SynthesiS and Optimization

of Cyber-Physical Systems

1

𝜇1 𝜇𝑛

Safety Constraints

Perf. Measures

Model of

Physical Comp.Model of

Cyber Comp.

Unknown

Known

Learning

Analysis

Synthesize

Optimize

Fig 1. The LASSO Framework

TU Graz, May 2017

NEXT UPPAAL BRANCHES

PhD/visitors/PostDocPosition available

Contact: [email protected]

Thank You !


Exercises (from yesterday)

Exercise 28: Jobshop Scheduling

Look at

https://www.dropbox.com/sh/96tvd7qklf1gcyh/AADFwVJk97L9qtJLkf0VUgtOa?dl=0

In UPPAAL SMC

Coffee Machine

Hammer Nail

Hybrid

… and more

TU Graz, 2017 Kim Larsen [86]

https://www.dropbox.com/sh/96tvd7qklf1gcyh/AADFwVJk97L9qtJLkf0VUgtOa?dl=0

Exercises

Download UPPAAL STRATEGO 4.1.20-3 or 4.1.20-4

http://people.cs.aau.dk/~marius/stratego

Experiment with Newspaper

Traffic Uppsala

Cruice

Small Car

See Dropbox


http://people.cs.aau.dk/~marius/stratego

Download - Timed Games STRATEGO and Stochastic Priced Timed Gamespeople.cs.aau.dk › ~kgl › GRAZ17 › GRAZ5.pdf · 2017-05-30 · 1 =[V 1,V 2] [4.9,25.1] s.t I 1 is m-stable i.e. from any

Top Related