annu gmeiner igor konnov ulrich schmid helmut veith josef...

108
Model Checking of Fault-Tolerant Distributed Algorithms Part II: Parameterized Model Checking of Fault-tolerant Distributed Algorithms by Abstraction Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef Widder LOVE 2016 @ TU Wien Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 1 / 44

Upload: others

Post on 09-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Model Checking of Fault-Tolerant Distributed Algorithms

Part II: Parameterized Model Checking of Fault-tolerant DistributedAlgorithms by Abstraction

Annu Gmeiner Igor Konnov Ulrich Schmid

Helmut Veith Josef Widder

LOVE 2016 @ TU Wien

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 1 / 44

Page 2: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

automaticverification?

Page 3: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

automaticverification?

Page 4: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Large fault-tolerant systems

safety critical systems: cars, planes, etc.

rare but dangerous faults

datacenters: thousands of computers

faults happen every day

114

7.1

Assessingandvalidatingthestandard

nodeHIT

Sdesign

Figure

7.1:

DARTSprototypeboard,comprising8intercon

nectedHIT

Schips

scaling model checking to 1000 processes

or model checking for all sizes?

P1000 |= ϕ

∀n ≥ 4. Pn |= ϕ

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 3 / 44

Page 5: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Fault-tolerant distributed algorithms

n?

??

t fn processes communicate by sending messages

f processes are faulty (unknown)

t is an upper bound on f (known)

resilience condition on n, t, and f , e.g., n > 3t ∧ t ≥ f ≥ 0

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 4 / 44

Page 6: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Fault-tolerant DAs: Model Checking Challenges

unbounded data types

counting how many messages have been received

parameterization in multiple parameters

among n processes f ≤ t are faulty with n > 3t

contrast to concurrent programs

fault tolerance against adverse environments

degrees of concurrency

many degrees of partial synchrony

continuous time

fault-tolerant clock synchronization

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 5 / 44

Page 7: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Model checking problem for fault-tolerant DA algorithms

Parameterized model checking problem:

given a distributed algorithm and spec. ϕ

show for all n, t, and f satisfying n > 3t ∧ t ≥ f ≥ 0M(n, t, f ) |= ϕ

every M(n, t, f ) is a system of n − f correct processes

n

??

?t

n

??

?t f

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 6 / 44

Page 8: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Model checking problem for fault-tolerant DA algorithms

Parameterized model checking problem:

given a distributed algorithm and spec. ϕ

show for all n, t, and f satisfying resilience conditionM(n, t, f ) |= ϕ

every M(n, t, f ) is a system of N(n, f ) correct processes

n

??

?t

n

??

?t f

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 6 / 44

Page 9: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Properties in Linear Temporal Logic

Unforgeability (U). If vi = 0 for all correct processes i , then for all correctprocesses j , acceptj remains 0 forever.

G(( n−f∧

i=1

vi = 0)→ G

( n−f∧j=1

acceptj = 0))

Safety

Completeness (C). If vi = 1 for all correct processes i , then there is a correctprocess j that eventually sets acceptj to 1.

G(( n−f∧

i=1

vi = 1)→ F

( n−f∨j=1

acceptj = 1))

Liveness

Relay (R). If a correct process i sets accepti to 1, then eventually all correctprocesses j set acceptj to 1.

G(( n−f∨

i=1

accepti = 1)→ F

( n−f∧j=1

acceptj = 1))

Liveness

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 7 / 44

Page 10: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Properties in Linear Temporal Logic

Unforgeability (U). If vi = 0 for all correct processes i , then for all correctprocesses j , acceptj remains 0 forever.

G(( n−f∧

i=1

vi = 0)→ G

( n−f∧j=1

acceptj = 0))

Safety

Completeness (C). If vi = 1 for all correct processes i , then there is a correctprocess j that eventually sets acceptj to 1.

G(( n−f∧

i=1

vi = 1)→ F

( n−f∨j=1

acceptj = 1))

Liveness

Relay (R). If a correct process i sets accepti to 1, then eventually all correctprocesses j set acceptj to 1.

G(( n−f∨

i=1

accepti = 1)→ F

( n−f∧j=1

acceptj = 1))

Liveness

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 7 / 44

Page 11: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Portfolio of techniques

abstraction

0 1 t+1 n-t n

symmetry

(•, �, �)

(�, •, �)

(�, �, •)

(2�, 1•)

partial order reduction

x++ y++

y++ x++

acceleration

x++

Proc 1

x++

Proc 2

x++

Proc 3

x+=3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 8 / 44

Page 12: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Portfolio of techniques

abstraction

0 1 t+1 n-t n

symmetry

(•, �, �)

(�, •, �)

(�, �, •)

(2�, 1•)

partial order reduction

x++ y++

y++ x++

acceleration

x++

Proc 1

x++

Proc 2

x++

Proc 3

x+=3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 8 / 44

Page 13: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Portfolio of techniques

abstraction

0 1 t+1 n-t n

symmetry

(•, �, �)

(�, •, �)

(�, �, •)

(2�, 1•)

partial order reduction

x++ y++

y++ x++

acceleration

x++

Proc 1

x++

Proc 2

x++

Proc 3

x+=3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 8 / 44

Page 14: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Portfolio of techniques

abstraction

0 1 t+1 n-t n

symmetry

(•, �, �)

(�, •, �)

(�, �, •)

(2�, 1•)

partial order reduction

x++ y++

y++ x++

acceleration

x++

Proc 1

x++

Proc 2

x++

Proc 3

x+=3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 8 / 44

Page 15: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Stacks of techniques

FMCAD’13

data abstraction

symmetry

counterabstraction

state enumerationor BDDs

SPIN, NuSMV-BDD

CONCUR’14

data abstraction

symmetry

counterabstraction

partial orders,acceleration,

bounded diameter

boundedmodel checking

NuSMV-SAT

CAV’15

data abstraction

symmetry&

integer counters

partial orders,acceleration,

schemas

boundedmodel checking

SMT

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 9 / 44

Page 16: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Threshold-guardedfault-tolerant

distributed algorithms

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 10 / 44

Page 17: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Threshold-guarded FTDAs

Fault-free construct: quantified guards (t=f=0)

Existential Guardif received m from some process then ...

Universal Guardif received m from all processes then ...

These guards allow one to treat the processes in a parameterized way

what if faults might occur?

Fault-Tolerant Algorithms: n processes, at most t are Byzantine

Threshold Guardif received m from n − t processes then ...

(the processes cannot refer to f!)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 11 / 44

Page 18: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Threshold-guarded FTDAs

Fault-free construct: quantified guards (t=f=0)

Existential Guardif received m from some process then ...

Universal Guardif received m from all processes then ...

These guards allow one to treat the processes in a parameterized way

what if faults might occur?

Fault-Tolerant Algorithms: n processes, at most t are Byzantine

Threshold Guardif received m from n − t processes then ...

(the processes cannot refer to f!)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 11 / 44

Page 19: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Threshold-guarded FTDAs

Fault-free construct: quantified guards (t=f=0)

Existential Guardif received m from some process then ...

Universal Guardif received m from all processes then ...

These guards allow one to treat the processes in a parameterized way

what if faults might occur?

Fault-Tolerant Algorithms: n processes, at most t are Byzantine

Threshold Guardif received m from n − t processes then ...

(the processes cannot refer to f!)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 11 / 44

Page 20: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Reliable Broadcast: Sample Execution

broadcast

broadcast

≥ t + 1

≥ n − taccept

≥ n − taccept

≥ n − taccept

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 12 / 44

Page 21: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Reliable Broadcast: Sample Execution

broadcast

broadcast

≥ t + 1

≥ n − taccept

≥ n − taccept

≥ n − taccept

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 12 / 44

Page 22: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Control Flow AutomataVariables of process ivi : {0 , 1} in i t with 0 or 1accepti : {0 , 1} in i t with 0

An indivisible step:i f vi = 1then send ( echo ) to all ;

i f r e c e i v ed (echo) from at l e a s tt + 1 distinct p r o c e s s e sand not sent ( echo ) be f o r e

then send ( echo ) to all ;

i f received ( echo ) from at l e a s tn - t distinct p r o c e s s e s

then accepti := 1 ;

n − f copies of the process

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 13 / 44

Page 23: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Counting argument in threshold-guarded algorithms

n

t f

if received m from t + 1 processes then ...

t + 1

at least one non-faulty sent the message

Correct processes count distinct incoming messages

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 14 / 44

Page 24: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Counting argument in threshold-guarded algorithms

n

t f

if received m from t + 1 processes then ...

t + 1

at least one non-faulty sent the message

Correct processes count distinct incoming messages

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 14 / 44

Page 25: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Counting argument in threshold-guarded algorithms

n

t f

if received m from t + 1 processes then ...

t + 1

at least one non-faulty sent the message

Correct processes count distinct incoming messages

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 14 / 44

Page 26: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

concrete values are not important

thresholds are essential:0, 1, t + 1, n − t

intervals with symbolic boundaries:

I0 = [0, 1)I1 = [1, t + 1)It+1 = [t + 1, n − t)In−t = [n − t,∞)

Parameteric Interval Abstraction (PIA)

Similar to interval abstraction:[t + 1, n − t) rather than [4, 10).

Total order: 0 < 1 < t + 1 < n − t forall parameters satisfying RC:n > 3t, t ≥ f ≥ 0.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 15 / 44

Page 27: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

concrete values are not important

thresholds are essential:0, 1, t + 1, n − tintervals with symbolic boundaries:

I0 = [0, 1)I1 = [1, t + 1)It+1 = [t + 1, n − t)In−t = [n − t,∞)

Parameteric Interval Abstraction (PIA)

Similar to interval abstraction:[t + 1, n − t) rather than [4, 10).

Total order: 0 < 1 < t + 1 < n − t forall parameters satisfying RC:n > 3t, t ≥ f ≥ 0.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 15 / 44

Page 28: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

concrete values are not important

thresholds are essential:0, 1, t + 1, n − tintervals with symbolic boundaries:

I0 = [0, 1)I1 = [1, t + 1)It+1 = [t + 1, n − t)In−t = [n − t,∞)

Parameteric Interval Abstraction (PIA)

Similar to interval abstraction:[t + 1, n − t) rather than [4, 10).

Total order: 0 < 1 < t + 1 < n − t forall parameters satisfying RC:n > 3t, t ≥ f ≥ 0.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 15 / 44

Page 29: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Technical challenges

We have to reduce the verification of an infinite number of instanceswhere

1 the process code is parameterized

2 the number of processes is parameterized

to one finite state model checking instance

We do that by:

1 PIA data abstraction

2 PIA counter abstraction

abstraction is an over approximation ⇒ possible abstract behavior thatdoes not correspond to a concrete behavior.

3 Refining spurious counter-examples

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 16 / 44

Page 30: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Technical challenges

We have to reduce the verification of an infinite number of instanceswhere

1 the process code is parameterized

2 the number of processes is parameterized

to one finite state model checking instance

We do that by:

1 PIA data abstraction

2 PIA counter abstraction

abstraction is an over approximation ⇒ possible abstract behavior thatdoes not correspond to a concrete behavior.

3 Refining spurious counter-examples

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 16 / 44

Page 31: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Technical challenges

We have to reduce the verification of an infinite number of instanceswhere

1 the process code is parameterized

2 the number of processes is parameterized

to one finite state model checking instance

We do that by:

1 PIA data abstraction

2 PIA counter abstraction

abstraction is an over approximation ⇒ possible abstract behavior thatdoes not correspond to a concrete behavior.

3 Refining spurious counter-examples

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 16 / 44

Page 32: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstraction overview

Parameterized family{M(n, t, f ) = P(n, t, f ) ‖ · · · ‖ P(n, t, f )︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0} extract

Parametric Interval Domain D̂parametric intervaldata abstraction

Uniform parameterized family{M̂(n, t, f ) = P̂ ‖ · · · ‖ P̂︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0}P̂ does not depend on n, t, f

P̂ simulates P(n, t, f )change representation

Counter representation

parametric intervalcounter abstraction

one abstract system A thatsimulates for every n, t, fthe behavior of M(n, t, f )

finite-state model checking

replay the counter-example

refine the system

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 17 / 44

Page 33: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstraction overview

Parameterized family{M(n, t, f ) = P(n, t, f ) ‖ · · · ‖ P(n, t, f )︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0} extract

Parametric Interval Domain D̂parametric intervaldata abstraction

Uniform parameterized family{M̂(n, t, f ) = P̂ ‖ · · · ‖ P̂︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0}P̂ does not depend on n, t, f

P̂ simulates P(n, t, f )change representation

Counter representation

parametric intervalcounter abstraction

one abstract system A thatsimulates for every n, t, fthe behavior of M(n, t, f )

finite-state model checking

replay the counter-example

refine the system

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 17 / 44

Page 34: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 18 / 44

Page 35: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

concrete values are not important

thresholds are essential:0, 1, t + 1, n − tintervals with symbolic boundaries:

I0 = [0, 1)I1 = [1, t + 1)It+1 = [t + 1, n − t)In−t = [n − t,∞)

Parameteric Interval Abstraction (PIA)

Similar to interval abstraction:[t + 1, n − t) rather than [4, 10).

Total order: 0 < 1 < t + 1 < n − t forall parameters satisfying RC:n > 3t, t ≥ f ≥ 0.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 19 / 44

Page 36: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1 It+1 In−t

It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x

is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1 . . .∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1) . . .∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 37: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−t

It+1 In−t

I0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1 . . .∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1) . . .∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 38: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1 It+1 In−t

It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1,

is abstracted as:x = I0 ∧ x ′ = I1 . . .∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1) . . .∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 39: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−tIt+1 In−t

I0 I1 It+1 In−t

I0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1 . . .

∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1) . . .∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 40: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−tIt+1 In−tI0 I1 It+1 In−t

I0 I1 It+1 In−t

I0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1

. . .

∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1) . . .

∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 41: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−tIt+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

I0 I1 It+1 In−t

I0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1

. . .

∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1)

. . .

∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t) . . .

∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 42: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−tIt+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

I0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1

. . .

∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1)

. . .

∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t)

. . .

∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 43: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract operations

Concrete:

Abstract:

0 1 t + 1 n − t above

· · ·

I0 I1

It+1 In−tIt+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−tI0 I1 It+1 In−t

Concrete t + 1 ≤ x is abstracted as x = It+1 ∨ x = In−t .

Concrete x ′ = x + 1, is abstracted as:x = I0 ∧ x ′ = I1

. . .

∨x = I1 ∧ (x ′ = I1 ∨ x ′ = It+1)

. . .

∨x = It+1 ∧ (x ′ = It+1 ∨ x ′ = In−t)

. . .

∨x = In−t ∧ x ′ = In−t

abstract increase may keep the same value!

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 20 / 44

Page 44: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract CFA

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

[nrcvd = I0 ∧ nsnt = I0 ∧ (nrcvd ′ = I0 ∨ nrcvd ′ = I1)

]∨ . . .

¬(t + 1 ≤ nrcvd)

nrcvd = It+1 ∨ nrcvd = In−t

sv = V0

¬(sv = V0)[nsnt = I1 ∧ (nsnt ′ = I1 ∨ nsnt ′ = It+1)

]∨ . . .

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 21 / 44

Page 45: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract CFA

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

[nrcvd = I0 ∧ nsnt = I0 ∧ (nrcvd ′ = I0 ∨ nrcvd ′ = I1)

]∨ . . .

¬(t + 1 ≤ nrcvd)

nrcvd = It+1 ∨ nrcvd = In−t

sv = V0

¬(sv = V0)[nsnt = I1 ∧ (nsnt ′ = I1 ∨ nsnt ′ = It+1)

]∨ . . .

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := ACJosef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 21 / 44

Page 46: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstraction overview

Parameterized family{M(n, t, f ) = P(n, t, f ) ‖ · · · ‖ P(n, t, f )︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0} extract

Parametric Interval Domain D̂parametric intervaldata abstraction

Uniform parameterized family{M̂(n, t, f ) = P̂ ‖ · · · ‖ P̂︸ ︷︷ ︸

N(n,t,f ) processes

: n > 3t, t ≥ f , f ≥ 0}P̂ does not depend on n, t, f

P̂ simulates P(n, t, f )change representation

Counter representation

parametric intervalcounter abstraction

one abstract system A thatsimulates for every n, t, fthe behavior of M(n, t, f )

finite-state model checking

replay the counter-example

refine the system

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 22 / 44

Page 47: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Counter abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 23 / 44

Page 48: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Classic (0, 1,∞)-counter abstraction

Pnueli, Xu, and Zuck (2001) introduced (0, 1,∞)-counter abstraction:

finitely many local states,e.g., {N,T ,C}.

based on counter representation:for each local states count how many processes are in it

abstract the number of processes in every state,e.g., K : C 7→ 0, T 7→ 1, N 7→ “many”.

perfectly reflects mutual exclusion propertiese.g., G (K (C ) = 0 ∨ K (C ) = 1).

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 24 / 44

Page 49: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Classic (0, 1,∞)-counter abstraction

Pnueli, Xu, and Zuck (2001) introduced (0, 1,∞)-counter abstraction:

finitely many local states,e.g., {N,T ,C}.

based on counter representation:for each local states count how many processes are in it

abstract the number of processes in every state,e.g., K : C 7→ 0, T 7→ 1, N 7→ “many”.

perfectly reflects mutual exclusion propertiese.g., G (K (C ) = 0 ∨ K (C ) = 1).

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 24 / 44

Page 50: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Limits of (0, 1,∞)-counter abstraction

Our parametric data + counter abstraction:

we require finer counting of processes:

t + 1 processes in a specific state can force global progress,t processes cannot

mapping t, t + 1, and n − t to “many” is too coarse.

starting point of our approach...

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 25 / 44

Page 51: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Limits of (0, 1,∞)-counter abstraction

Our parametric data + counter abstraction:

we require finer counting of processes:

t + 1 processes in a specific state can force global progress,t processes cannot

mapping t, t + 1, and n − t to “many” is too coarse.

starting point of our approach...

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 25 / 44

Page 52: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data + counter abstraction over parametric intervals

n = 6, t = 1, f = 1

t + 1 = 2, n − t = 5

������XXXXXXn = 6, �����

�XXXXXXt = 1, ������XXXXXXf = 1

n > 3 · t ∧ t ≥ f

Parametric intervals:

I0 = [0, 1) I1 = [1, t + 1)

It+1 = [t + 1, n − t)

In−t = [n − t,∞)

nr. processes (counters)

received received

sent accepted

•0

•0

•1

•1

•2

•2

•3

•3

•4

•4

•5

•5

•6

•6

• • • •I0 I1 It+1 In−t

• • • •I0 I1 It+1 In−t

•0•1•2•3•4•5•6•

•••

I0

I1

It+1

In−t

when all correct processes accepted,all non-zero counters are in this area

Local state is (sv , nrcvd),where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n

A local state is (sv , nrcvd),where sv ∈ {sent, accepted} and nrcvd ∈ {I0, I1, It+1, In−t}

3 processes at (sent, received=3)

1 process at (accepted, received=5)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 26 / 44

Page 53: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data + counter abstraction over parametric intervals

n = 6, t = 1, f = 1

t + 1 = 2, n − t = 5

������XXXXXXn = 6, �����

�XXXXXXt = 1, ������XXXXXXf = 1

n > 3 · t ∧ t ≥ f

Parametric intervals:

I0 = [0, 1) I1 = [1, t + 1)

It+1 = [t + 1, n − t)

In−t = [n − t,∞)

nr. processes (counters)

received received

sent accepted

•0

•0

•1

•1

•2

•2

•3

•3

•4

•4

•5

•5

•6

•6

• • • •I0 I1 It+1 In−t

• • • •I0 I1 It+1 In−t

•0•1•2•3•4•5•6•

•••

I0

I1

It+1

In−t

when all correct processes accepted,all non-zero counters are in this area

Local state is (sv , nrcvd),where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n

A local state is (sv , nrcvd),where sv ∈ {sent, accepted} and nrcvd ∈ {I0, I1, It+1, In−t}

3 processes at (sent, received=3)

1 process at (accepted, received=5)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 26 / 44

Page 54: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data + counter abstraction over parametric intervals

n = 6, t = 1, f = 1

t + 1 = 2, n − t = 5

������XXXXXXn = 6, �����

�XXXXXXt = 1, ������XXXXXXf = 1

n > 3 · t ∧ t ≥ f

Parametric intervals:

I0 = [0, 1) I1 = [1, t + 1)

It+1 = [t + 1, n − t)

In−t = [n − t,∞)

nr. processes (counters)

received received

sent accepted

•0

•0

•1

•1

•2

•2

•3

•3

•4

•4

•5

•5

•6

•6

• • • •I0 I1 It+1 In−t

• • • •I0 I1 It+1 In−t

•0•1•2•3•4•5•6•

•••

I0

I1

It+1

In−t

when all correct processes accepted,all non-zero counters are in this area

Local state is (sv , nrcvd),where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n

A local state is (sv , nrcvd),where sv ∈ {sent, accepted} and nrcvd ∈ {I0, I1, It+1, In−t}

3 processes at (sent, received=3)

1 process at (accepted, received=5)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 26 / 44

Page 55: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data + counter abstraction over parametric intervals

n = 6, t = 1, f = 1

t + 1 = 2, n − t = 5

������XXXXXXn = 6, �����

�XXXXXXt = 1, ������XXXXXXf = 1

n > 3 · t ∧ t ≥ f

Parametric intervals:

I0 = [0, 1) I1 = [1, t + 1)

It+1 = [t + 1, n − t)

In−t = [n − t,∞)nr. processes (counters)

received received

sent accepted

•0

•0

•1

•1

•2

•2

•3

•3

•4

•4

•5

•5

•6

•6

• • • •I0 I1 It+1 In−t

• • • •I0 I1 It+1 In−t

•0•1•2•3•4•5•6•

•••

I0

I1

It+1

In−t

when all correct processes accepted,all non-zero counters are in this area

Local state is (sv , nrcvd),where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n

A local state is (sv , nrcvd),where sv ∈ {sent, accepted} and nrcvd ∈ {I0, I1, It+1, In−t}

3 processes at (sent, received=3)

1 process at (accepted, received=5)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 26 / 44

Page 56: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Data + counter abstraction over parametric intervals

n = 6, t = 1, f = 1

t + 1 = 2, n − t = 5

������XXXXXXn = 6, �����

�XXXXXXt = 1, ������XXXXXXf = 1

n > 3 · t ∧ t ≥ f

Parametric intervals:

I0 = [0, 1) I1 = [1, t + 1)

It+1 = [t + 1, n − t)

In−t = [n − t,∞)nr. processes (counters)

received received

sent accepted

•0

•0

•1

•1

•2

•2

•3

•3

•4

•4

•5

•5

•6

•6

• • • •I0 I1 It+1 In−t

• • • •I0 I1 It+1 In−t

•0•1•2•3•4•5•6•

•••

I0

I1

It+1

In−t

when all correct processes accepted,all non-zero counters are in this area

Local state is (sv , nrcvd),where sv ∈ {sent, accepted} and 0 ≤ rcvd ≤ n

A local state is (sv , nrcvd),where sv ∈ {sent, accepted} and nrcvd ∈ {I0, I1, It+1, In−t}

3 processes at (sent, received=3)

1 process at (accepted, received=5)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 26 / 44

Page 57: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstraction refinement

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 27 / 44

Page 58: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Spurious behavior

abstraction adds behaviors (e.g., x’=x+1 may lead to x’ being equal to x)

⇒ specs that hold in concrete system may be violated in abstract system

spurious counterexamples

we have to reduce the behaviors of the abstract systemmake it more concrete

. . . based on the counterexamples = CEGAR

Three sources of spurious behavior

# processes decreasing or increasing

# messages sent 6= # processes which have sent a message

unfair loops

. . . and a new abstraction phenomenon

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 28 / 44

Page 59: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Spurious behavior

abstraction adds behaviors (e.g., x’=x+1 may lead to x’ being equal to x)

⇒ specs that hold in concrete system may be violated in abstract system

spurious counterexamples

we have to reduce the behaviors of the abstract systemmake it more concrete

. . . based on the counterexamples = CEGAR

Three sources of spurious behavior

# processes decreasing or increasing

# messages sent 6= # processes which have sent a message

unfair loops

. . . and a new abstraction phenomenon

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 28 / 44

Page 60: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Spurious behavior

abstraction adds behaviors (e.g., x’=x+1 may lead to x’ being equal to x)

⇒ specs that hold in concrete system may be violated in abstract system

spurious counterexamples

we have to reduce the behaviors of the abstract systemmake it more concrete

. . . based on the counterexamples = CEGAR

Three sources of spurious behavior

# processes decreasing or increasing

# messages sent 6= # processes which have sent a message

unfair loops

. . . and a new abstraction phenomenon

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 28 / 44

Page 61: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Spurious behavior

abstraction adds behaviors (e.g., x’=x+1 may lead to x’ being equal to x)

⇒ specs that hold in concrete system may be violated in abstract system

spurious counterexamples

we have to reduce the behaviors of the abstract systemmake it more concrete

. . . based on the counterexamples = CEGAR

Three sources of spurious behavior

# processes decreasing or increasing

# messages sent 6= # processes which have sent a message

unfair loops

. . . and a new abstraction phenomenon

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 28 / 44

Page 62: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abst. refinement — uniformly spurious paths

Classic case:

Concrete

Abstract

Our case:

Concrete

n2,t2,

f2

Concrete

n1,t1,

f1

Abstract

···

···

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 29 / 44

Page 63: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abst. refinement — uniformly spurious paths

Classic case:

Concrete

Abstract

Our case:

Concrete

n2,t2,

f2

Concrete

n1,t1,

f1

Abstract

···

···

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 29 / 44

Page 64: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

CEGAR — automated workflow

Model Checking

correct

Abstraction refinementusing SMT

counterexample

CE feasible: bug

CE spurious:refined abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 30 / 44

Page 65: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

CEGAR — automated workflow

Model Checkingcorrect

Abstraction refinementusing SMT

counterexample

CE feasible: bug

CE spurious:refined abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 30 / 44

Page 66: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

CEGAR — automated workflow

Model Checkingcorrect

Abstraction refinementusing SMT

counterexample

CE feasible: bug

CE spurious:refined abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 30 / 44

Page 67: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

CEGAR — automated workflow

Model Checkingcorrect

Abstraction refinementusing SMT

counterexample

CE feasible: bug

CE spurious:refined abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 30 / 44

Page 68: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

CEGAR — automated workflow

Model Checkingcorrect

Abstraction refinementusing SMT

counterexample

CE feasible: bug

CE spurious:refined abstraction

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 30 / 44

Page 69: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

What is SMT?

recall SAT:

given a Boolean formula, e.g., (¬a ∨ ¬b ∨ c) ∧ (¬a ∨ b ∨ d ∨ e)

is there an assignment of true and false to variables a, b, c , d , esuch that the formula evaluates to true?

Satisfiability Modulo Theories (SMT) :

here just linear arithmetics

given a formula, e.g.,

x = y ∧ y = z ∧ u 6= x ∧ (x + y ≤ 1 ∧ 2x + y = 1) ∨ 3x + 2y ≥ 3

is there an assignment of values to u, x , y , z such that formulaevaluates to true?

practically efficient tools: Yices, Z3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 31 / 44

Page 70: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

What is SMT?

recall SAT:

given a Boolean formula, e.g., (¬a ∨ ¬b ∨ c) ∧ (¬a ∨ b ∨ d ∨ e)

is there an assignment of true and false to variables a, b, c , d , esuch that the formula evaluates to true?

Satisfiability Modulo Theories (SMT) :

here just linear arithmetics

given a formula, e.g.,

x = y ∧ y = z ∧ u 6= x ∧ (x + y ≤ 1 ∧ 2x + y = 1) ∨ 3x + 2y ≥ 3

is there an assignment of values to u, x , y , z such that formulaevaluates to true?

practically efficient tools: Yices, Z3

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 31 / 44

Page 71: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Counter example: losing processes

Output of data abstraction: 16 local states: L = {(sv , ˆnrcvd)with sv ∈ {v0, v1, sent, accepted} and ˆrcvd ∈ {I0, I1, It+1, In−t}}

An abstract global state is (k̂ , ˆnsnt),where ˆnsnt ∈ {I0, I1, It+1, In−t} and k̂ : L→ {I0, I1, It+1, In−t}

Consider an abstract trace:

ˆnsnt1 = I0

k̂1(`) =In−t , if ` = (v1, I0)

I0, otherwise

ˆnsnt2 = I1

k̂2(`) =In−t , if ` = (v1, I0)

I1, if ` = (sent, I0)

I0, otherwise

ˆnsnt3 = It+1

k̂3(`) =In−t , if ` = (v1, I0)

It+1, if ` = (sent, I0)

I0, otherwise

Encode the last state in SMT as a conjunction T of the constraints:

resilience condition n > 3t ∧ t ≥ f ∧ f ≥ 0

zero counters (i 6= 4 ∧ i 6= 8)→ 0 ≤ k3[i ] < 1

non-zero counters n − t ≤ k3[4] ∧ t + 1 ≤ k3[8] < n − t

system size n − f = k3[0] + k3[1] + · · ·+ k3[15]

UNSAT

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 32 / 44

Page 72: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Remove transitions

We ask the SMT solver:

is there a satisfiable assignment for T?

if yes,

then the state is OK, may be part of a real counterexample

if not, then the state is spurious

remove transitions to that state in the abstract system

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 33 / 44

Page 73: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Liveness

distributed algorithm requires reliable communication

every message sent is eventually received

¬in transit ≡ [∀i . nrcvd i ≥ nsnt]

fairness F G¬in transit necessary to verify liveness,

e.g.,(

F G¬in transit →(G ([∀i . svi = v1]→ F [∀i . svi = accept])

))

counter example (lasso):

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

s1 ¬in transit

s2

sk

s3

· · ·

· · ·

· · ·

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 34 / 44

Page 74: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Liveness

distributed algorithm requires reliable communication

every message sent is eventually received

¬in transit ≡ [∀i . nrcvd i ≥ nsnt]

fairness F G¬in transit necessary to verify liveness,

e.g.,(

F G¬in transit →(G ([∀i . svi = v1]→ F [∀i . svi = accept])

))counter example (lasso):

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

s1 ¬in transit

s2

sk

s3

· · ·

· · ·

· · ·

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 34 / 44

Page 75: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Liveness — fairness suppression

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

s1 ¬in transit

s2

sk

s3

· · ·

· · ·

· · ·

if there is a spurious sj (all its concretizations violate ¬in transit),

then the loop is spurious.

refine fairness to F G¬in transit ∧ G F

( ∧1≤j≤k

“out of s ′′j

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 35 / 44

Page 76: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Liveness — fairness suppression

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

s1 ¬in transit

s2

sk

s3

· · ·

· · ·

· · ·

if there is a spurious sj (all its concretizations violate ¬in transit),

then the loop is spurious.

refine fairness to F G¬in transit ∧ G F

( ∧1≤j≤k

“out of s ′′j

)

. . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 35 / 44

Page 77: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Liveness — fairness suppression

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

¬in transit

s1 ¬in transit

s2

sk

s3

· · ·

· · ·

· · ·

if there is a spurious sj (all its concretizations violate ¬in transit),

then the loop is spurious.

refine fairness to F G¬in transit ∧ G F

( ∧1≤j≤k

“out of s ′′j

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 35 / 44

Page 78: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

experimental evaluation

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 36 / 44

Page 79: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Concrete vs. parameterized (Byzantine case)

Time to check relay (sec, logscale) Memory to check relay (MB, logscale)

Parameterized model checking performs well (the red line).

Experiments for fixed parameters quickly degrade(n = 9 runs out of memory).

We found counter-examples for the cases n = 3t and f > t,where the resilience condition is violated.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 37 / 44

Page 80: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Experimental results at a glance

Algorithm Fault Resilience Property Valid? #Refinements Time

ST87 Byz n > 3t U 3 0 4 sec.ST87 Byz n > 3t C 3 10 32 sec.ST87 Byz n > 3t R 3 10 24 sec.

ST87 Symm n > 2t U 3 0 1 sec.ST87 Symm n > 2t C 3 2 3 sec.ST87 Symm n > 2t R 3 12 16 sec.

ST87 Omit n > 2t U 3 0 1 sec.ST87 Omit n > 2t C 3 5 6 sec.ST87 Omit n > 2t R 3 5 10 sec.

ST87 Clean n > t U 3 0 2 sec.ST87 Clean n > t C 3 4 8 sec.ST87 Clean n > t R 3 13 31 sec.

CT96 Clean n > t U 3 0 1 sec.CT96 Clean n > t A 3 0 1 sec.CT96 Clean n > t R 3 0 1 sec.CT96 Clean n > t C 7 0 1 sec.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 38 / 44

Page 81: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

When resilience condition is wrong...

Algorithm Fault Resilience Property Valid? #Refinements Time

ST87 Byz n > 3t ∧ f ≤ t+1 U 7 9 56 sec.ST87 Byz n > 3t ∧ f ≤ t+1 C 7 11 52 sec.ST87 Byz n > 3t ∧ f ≤ t+1 R 7 10 17 sec.

ST87 Byz n ≥ 3t ∧ f ≤ t U 3 0 5 sec.ST87 Byz n ≥ 3t ∧ f ≤ t C 3 9 32 sec.ST87 Byz n ≥ 3t ∧ f ≤ t R 7 30 78 sec.

ST87 Symm n > 2t ∧ f ≤ t+1 U 7 0 2 sec.ST87 Symm n > 2t ∧ f ≤ t+1 C 7 2 4 sec.ST87 Symm n > 2t ∧ f ≤ t+1 R 3 8 12 sec.

ST87 Omit n ≥ 2t ∧ f ≤ t U 3 0 1 sec.ST87 Omit n ≥ 2t ∧ f ≤ t C 7 0 2 sec.ST87 Omit n ≥ 2t ∧ f ≤ t R 7 0 2 sec.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 39 / 44

Page 82: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Summary of results

Abstraction tailored for distributed algorithms

threshold-basedfault-tolerantallows to express different fault assumptions

Verification of threshold-based fault-tolerant algorithms

with threshold guards that are widely usedByzantine faults (and other)for all system sizes

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 40 / 44

Page 83: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Related work: non-parameterized

Model checking of the small size instances:

clock synchronization [Steiner, Rushby, Sorea, Pfeifer 2004]

consensus [Tsuchiya, Schiper 2011]

asynchronous agreement, folklore broadcast, condition-basedconsensus [John, Konnov, Schmid, Veith, Widder 2013]

and more...

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 41 / 44

Page 84: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Related work: parameterized case

Regular model checking of fault-tolerant distributed protocols:

[Fisman, Kupferman, Lustig 2008]

“First-shot” theoretical framework.

No guards like x ≥ t + 1, only x ≥ 1.

No implementation.

Manual analysis applied to folklore broadcast (crash faults).

Backward reachability using SMT with arrays:

[Alberti, Ghilardi, Pagani, Ranise, Rossi 2010-2012]

Implementation.

Experiments on Chandra-Toueg 1990.

No resilience conditions like n > 3t.

Safety only.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 42 / 44

Page 85: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Related work: parameterized case

Regular model checking of fault-tolerant distributed protocols:

[Fisman, Kupferman, Lustig 2008]

“First-shot” theoretical framework.

No guards like x ≥ t + 1, only x ≥ 1.

No implementation.

Manual analysis applied to folklore broadcast (crash faults).

Backward reachability using SMT with arrays:

[Alberti, Ghilardi, Pagani, Ranise, Rossi 2010-2012]

Implementation.

Experiments on Chandra-Toueg 1990.

No resilience conditions like n > 3t.

Safety only.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 42 / 44

Page 86: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Our current work

Discrete

synchronous

Discretepartially

synchronous

Discrete

asynchronous

Continuous

synchronous

Continuouspartially

synchronous

One instance/

finite payload

Many inst./

finite payload

Many inst./unbounded

payload

Messages with

reals

core of {ST87,

BT87, CT96},

MA06 (common),

MR04 (binary)

one-shot broadcast, c.b.consensus

DHM12

ST87

AK00

CT96

(failure detector)

DLS86, MA06,

L98 (Paxos)

ST87, BT87,

CT96, DAs with

failure-detectors

DLPSW86

DFLPS13

WS07

ST87 (JACM)

FSFK06

WS09

clock sync

broadcast

approx. agreement

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 43 / 44

Page 87: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Future work: threshold guards + orthogonal features

Discrete

synchronous

Discretepartially

synchronous

Discrete

asynchronous

Continuous

synchronous

Continuouspartially

synchronous

One instance/

finite payload

Many inst./

finite payload

Many inst./unbounded

payload

Messages with

reals

core of {ST87,

BT87, CT96},

MA06 (common),

MR04 (binary)

one-shot broadcast, c.b.consensus

DHM12

ST87

AK00

CT96

(failure detector)

DLS86, MA06,

L98 (Paxos)

ST87, BT87,

CT96, DAs with

failure-detectors

DLPSW86

DFLPS13

WS07

ST87 (JACM)

FSFK06

WS09

clock sync

broadcast

approx. agreement

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 43 / 44

Page 88: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Thank you!

[http://forsyte.at/software/bymc

]

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 44 / 44

Page 89: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

the implementation

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 45 / 44

Page 90: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Tool Chain: ByMC

Parametric Promela code static analysis + Yices

Parametric Interval Domain D̂Parametric data abstraction

with Yices

Parametric Promela code

Parametric counter ab-straction with Yices

normalPromela code

Spin

property holds

counterexample

Refine

Concrete counterrepresentation (VASS)

SMT formula

Yices

counterexample feasible

invariant candidates (by the user)

unsatsat

Figure: The abstraction scheme

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 46 / 44

Page 91: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Tool Chain: ByMC

Parametric Promela code static analysis + Yices

Parametric Interval Domain D̂Parametric data abstraction

with Yices

Parametric Promela code

Parametric counter ab-straction with Yices

normalPromela code

Spin

property holds

counterexample

Refine

Concrete counterrepresentation (VASS)

SMT formula

Yices

counterexample feasible

invariant candidates (by the user)

unsatsat

Figure: The abstraction scheme

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 46 / 44

Page 92: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Tool Chain: ByMC

Parametric Promela code static analysis + Yices

Parametric Interval Domain D̂Parametric data abstraction

with Yices

Parametric Promela code

Parametric counter ab-straction with Yices

normalPromela code

Spin

property holds

counterexample

Refine

Concrete counterrepresentation (VASS)

SMT formula

Yices

counterexample feasible

invariant candidates (by the user)

unsatsat

Figure: The abstraction scheme

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 46 / 44

Page 93: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Experimental setup

The tool (source code in OCaml),

the code of the distributed algorithms in Parametric Promela,

and a virtual machine with full setup

are available at: http://forsyte.at/software/bymc

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 47 / 44

Page 94: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Running the tool — concrete case

user specifies parameter value

useful to check whether the code behaves as expected

$bymc/verifyco-spin "N=4,T=1,F=1" bcast-byz.pml relay

model checking problem in directory“./x/spin-bcast-byz-relay-N=4,T=1,F=1”

in concrete.prm

parameters are replaced by numbersprocess prototype is replaced with N − F = 3 active processes

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 48 / 44

Page 95: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Running the tool — parameterized model checking

PIA data and counter abstraction

finite-state model checking on abstract model

$bymc/verifypa-spin bcast-omit.pml relay

model checking problem in directory“./x/bcast-byz-relay-yymmdd-HHMM.*”

directory contains

abs-interval.prm: result of the data abstraction;abs-counter.prm: result of the counter abstraction;abs-vass.prm: auxiliary abstraction for abstraction refinement;mc.out: the last output by Spin;cex.trace: the counterexample (if there is one);yices.log: communication log with Yices.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 49 / 44

Page 96: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Fairness, Refinement, and Invariants

In the Byzantine case we have in transit : ∀i . (nrcvd i ≥ nsnt) andG F¬in transit.

In this case communication fairness implies computation fairness.

But in the abstract version nsnt can deviate from the number ofprocesses who sent the echo message.

In this case the user formulates a simple state invariant candidate,e.g., nsnt = K ([sv = SE ∨ sv = AC ]) (on the level of the originalconcrete system).

The tool checks automatically, whether the candidate is actually astate invariant.

After the abstraction the abstract version of the invariant restricts thebehavior of the abstract transition system.

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 50 / 44

Page 97: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify liveness

counter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 51 / 44

Page 98: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 51 / 44

Page 99: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

)

. . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 51 / 44

Page 100: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 51 / 44

Page 101: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify liveness

counter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 52 / 44

Page 102: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 52 / 44

Page 103: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

)

. . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 52 / 44

Page 104: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Parametric abstraction refinement — justice suppression

justice G F¬in transit necessary to verify livenesscounter example:

in transit

in transit

in transit

in transit

in transit

in transit

in transit

s1 in transit

s2

sk

s3

· · ·

· · ·

· · ·

if ∀j all concretizations of sj violate ¬in transit, then CE is spurious.

refine justice to G F¬in transit ∧ G F

( ∨1≤j≤k

¬at(sj)

). . . we use unsat cores to refine several loops at once

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 52 / 44

Page 105: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

asynchronous reliable broadcast (srikanth & toueg 1987)

the core of the classic broadcast algorithm from the da literature.it solves an agreement problem depending on the inputs vi .

Variables of process ivi : {0 , 1} in i t with 0 or 1accepti : {0 , 1} in i t with 0

An indivisible step:i f vi = 1then send ( echo ) to all ;

i f r e c e i v ed (echo) from at l e a s tt + 1 distinct p r o c e s s e sand not sent ( echo ) be f o r e

then send ( echo ) to all ;

i f received ( echo ) from at l e a s tn - t distinct p r o c e s s e s

then accepti := 1 ;

asynchronous

t byzantine faults

correct if n > 3tresilience condition rc

parameterized processskeleton p(n, t)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 53 / 44

Page 106: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

asynchronous reliable broadcast (srikanth & toueg 1987)

the core of the classic broadcast algorithm from the da literature.it solves an agreement problem depending on the inputs vi .

Variables of process ivi : {0 , 1} in i t with 0 or 1accepti : {0 , 1} in i t with 0

An indivisible step:i f vi = 1then send ( echo ) to all ;

i f r e c e i v ed (echo) from at l e a s tt + 1 distinct p r o c e s s e sand not sent ( echo ) be f o r e

then send ( echo ) to all ;

i f received ( echo ) from at l e a s tn - t distinct p r o c e s s e s

then accepti := 1 ;

asynchronous

t byzantine faults

correct if n > 3tresilience condition rc

parameterized processskeleton p(n, t)

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 53 / 44

Page 107: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract CFA

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

[nrcvd = I0 ∧ nsnt = I0 ∧ (nrcvd ′ = I0 ∨ nrcvd ′ = I1)

]∨ . . .

¬(t + 1 ≤ nrcvd)

nrcvd = It+1 ∨ nrcvd = In−t

sv = V0

¬(sv = V0)[nsnt = I1 ∧ (nsnt ′ = I1 ∨ nsnt ′ = It+1)

]∨ . . .

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

Josef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 54 / 44

Page 108: Annu Gmeiner Igor Konnov Ulrich Schmid Helmut Veith Josef …forsyte.at/wp-content/uploads/love2.pdf · 2016. 4. 17. · Model Checking of Fault-Tolerant Distributed Algorithms Part

Abstract CFA

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

nrcvd := z where (nrcvd ≤ z ∧ z ≤ nsnt + f )

¬(t + 1 ≤ nrcvd)

t + 1 ≤ nrcvd

sv = V0

¬(sv = V0)

inc nsnt

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := AC

qI

q0

q1

q2

q3

sv = V1

¬(sv = V1) inc nsnt

sv := SE

q4

q5

q6

q7

q8qF

[nrcvd = I0 ∧ nsnt = I0 ∧ (nrcvd ′ = I0 ∨ nrcvd ′ = I1)

]∨ . . .

¬(t + 1 ≤ nrcvd)

nrcvd = It+1 ∨ nrcvd = In−t

sv = V0

¬(sv = V0)[nsnt = I1 ∧ (nsnt ′ = I1 ∨ nsnt ′ = It+1)

]∨ . . .

n − t ≤ nrcvd

¬(n − t ≤ nrcvd)

sv := SE

sv := ACJosef Widder (TU Wien) Checking Fault-Tolerant Distributed Algos LOVE 2016 54 / 44