1
PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS
Presented by Michael Kuperstein, TechnionJoint work with Martin Vechev, IBM Research and Eran Yahav, Technion
2
Sequential Consistency We expect our programs to have
“Interleaving semantics” Consistent with program order
“The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.” – Leslie Lamport, 1973
3
Process 0: flag[0] := true while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } } // critical sectionturn := 1 flag[0] := false
Process 1: flag[1] := true while flag[0] = true { if turn ≠ 1 { flag[1] := false while turn ≠ 1 { } flag[1] := true } } // critical sectionturn := 0 flag[1] := false
Dekker’s Algorithm for Mutual Exclusion
Specification: mutual exclusion over critical section
4
…P0
MainMemor
y
…P1
……
……
XYZ
XYZ
123
Store Buffer Based Models TSO & PSO
x86 ~ TSO
Memory Fences Restore order
Every store before the fence becomes globally visible before anything after the fence executes
store flush
load
fence
5
Process 0: flag[0] := truefence while flag[1] = true { if turn ≠ 0 { flag[0] := false fence while turn ≠ 0 { } flag[0] := true fence } } // critical sectionturn := 1 fenceflag[0] := falsefence
Memory Fences
Fences are expensive 10s-100s of cycles
Practical Significance Data structures Linux Kernel spinlocks
Placing fences manually Overfencing: hurts
performance Underfencing: subtle
bugs
6
Process 0: flag[0] := truefence while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } } // critical sectionturn := 1 flag[0] := false
Memory Fences
Fences are expensive 10s-100s of cycles
Practical Significance Data structures Linux Kernel spinlocks
Placing fences manually Overfencing: hurts
performance Underfencing: subtle
bugs
7
Automatic Solutions Equivalence to Sequential Consistency
Reduce program behaviors to sequentially consistent (SC) runs
High-level specifications are ignored Goes back to Shasha & Snir [TOPLAS ’88]
Place fences to satisfy provided specification Using specification may forbid less executions May require fewer fences
Safe
SCPSO
8
Goal
P’ satisfies the specification S under M
BLENDER
Finite-State
ProgramP
SafetySpecificati
on S
Memory Model
M
Program P’
with Fences
9
General Recipe1. Compute reachable
states
2. Compute weakest constraints that guarantee all “bad states” are avoided
3. Implement the constraints with fences
10
Constraints Constraint language
Not every transition can be prevented using a fence10
P2 : (D) LOAD R1 = X
P1 : (D) LOAD R1 = X
P1:P2:
1 2 3A B C
XX
P1:P2:
1 2 3A B C
XX
P1:P2:
1 2 3A B C
XX
P1:P2:
1 2 3A B C
XX
Unavoidable
[A < D][B < D][C < D]
11
Concrete Transition System Building transition system under TSO/PSO is
hard No a-priori bound on buffer length
Unbounded state-space Even for programs that were finite-state under SC
Reachability has non-primitive recursive complexity [Atig et al., POPL ’10]
12
Abstract Memory Models (AMM) Bounded approximation of unbounded
buffers Strictly weaker than concrete TSO/PSO Finite-state programs remain finite-state
Reachability becomes effectively computable Construct finite (abstract) transition system
Apply fence inference Can also be used for verification Safe
SCPSO
AMM
13
Partial Coherence Abstractions
…P0
MainMemor
y
…P1
……
……
XYZ
XYX
P0
MainMemor
y
P1
X
Z
XYZ
Recent value
Bounded
length kUnordered elements
Y
Allows precise fence semantics
Allows precise loads from bufferKeeps the analysis precise for “well behaved” programs
Record what values appeared (withoutorder or number)
14
Partial Coherence Abstractions
1 2 3 4 5 6 7
{2,3,4,5}
1 2 3 4 5 6 7Concrete
Abstract
15
Abstract Fence Inference
1. Compute reachable abstract states
2. Compute constraints. Precision depends on abstraction.
3. Implement the constraints with fences
16
Fence Inference Results
Benchmarks are mutual exclusion primitives k - the bound on the FIFO part of the abstract buffer PD more “aggressive” than FD
Program
FD k=0
FD k=1
FD k=2
PD k=0
PD k=1
PD k=2
Sense0 Pet0 Dek0 Lam0 Fast0 Fast1a Fast1b Fast1c
17
Summary Partial-coherence abstractions
Verification without arbitrary bounds Abstraction precision affects quality of
results Synthesis of fences
Can infer optimal fences for mutual exclusion primitives
BLENDER
P
S
M
P’
18
Questions
19
Related Work Under-approximation
CheckFence [Burckhardt et al., PLDI ’07] Fender [KVY, FMCAD ’10] And more…
Over-approximation Equivalence to SC
Very imprecise Goes back to Shasha & Snir [TOPLAS ‘88]
Abstract Interpretation Varying precision Regular Abstraction [Linden et al., SPIN ’10] Partial-Coherence [KVY, PLDI ’11]