type systems for distributed data sharing ben liblit alex aikenkathy yelick

38
Type Systems For Distributed Data Sharing Ben Liblit Alex Aiken Kathy Yelick

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Type Systems ForDistributed Data Sharing

Ben LiblitAlex Aiken Kathy Yelick

Distributed Sharing: Many Uses

• Data location management• Cache coherence• Race condition detection• Program/algorithm documentation• Consistency model relaxation• Synchronization elimination• Autonomous garbage collection• Security

Distributed Memory Model

• Multiple machines, each with local memory

• Global memory is union of local memories

• Distinguish two types of pointers:– LocalLocal points to local memory only– GlobalGlobal points anywhere: machine, address– Different representations & operations

Type Grammar

boxedint::

globallocal::

• Boxed and unboxed values• Integers, pointers, and pairs

– Pairs are not assumed boxed

• References to boxes are either local or global

Review of Global Dereferencing:Standard Approach Unsound

5

int local boxed where,:

global boxed:

x

x

x =

x =

Review of Global Dereferencing:Sound With Type Expansion

5

expand:

global boxed:

x

x

x =

x =

Type Expansion in Detail

intintpop

pop,pop,pop

global boxed global boxedpop

intintexpand

pop,pop,expand

global boxed boxedexpand

2121

2121

ττ

ττω

Representation Versus Sharing

• Locally pointed-to data might not be private

5

Representation Versus Sharing

• Locally pointed-to data might not be private– Because of local / global aliasing

5

x =

Representation Versus Sharing

• Locally pointed-to data might not be private– Because of transitivity + pointer widening

5

y =

y =

y =

Representation Versus Sharing

• Globally pointed-to data might not be shared– What if “y” never actually happens?

5

y =

y =

Representation Versus Sharing

• But globally used data must be shared– If “y” can happen, local pointer cell is shared.– What about cell containing “5”?

5

y =

Data Sharing as Types

• Shared data allows certain operations– Access by way of global pointer

• Private data allows other operations– Optimizations, GC, fast monitors, etc.

• Some form of polymorphism is essential– Neither subsumes the other– But we can have a common supertype

boxedint::

privatemixedshared::

globallocal::

Augmented Type Grammar

• Allow subtyping of pointers, pairs– But not across pointers, since we allow assignment

• Allocation is explicitly shared or private• Question: what can you do with mixed data?

Late Enforcement:Limited Use of Global Pointers

global boxed boxedexpand

:

local boxed:

expand:

shared global boxed:

x

x

x

x

Late Enforcement: Applicability

Data location managementCache coherenceRace condition detectionProgram/algorithm documentationConsistency model relaxationSynchronization elimination Autonomous garbage collection (in practice) Security

Why Garbage Collection Breaks

1. Send out global pointer to my private data

2. Destroy all my local pointers to it

3. GC locally unreachable private data

4. …

5. Get that global pointer back again later

6. It points to my data, so coerce to local

7. Use this local pointer to my private data

Slightly Earlier Enforcement:No Escape of Private Addresses

shared global boxed shared boxedexpand

:

local boxed:

expand:

shared global boxed:

x

x

x

x

• Note that τ′ might reference private dataAutonomous garbage collection: OK Security: not OK

Early Enforcement:Shared is Transitively Closed

shared local boxed:

allShared :

private local boxed:

:

sp x

x

x

x

trueintallShared

allSharedallShared,allShared

shared boxedallShared

2121

τω

Recap of Enforcement Strategies

• Late enforcement– Anything can point to anything– Restricted global dereference & assignment

5

y =

3

Recap of Enforcement Strategies

• Slightly earlier enforcement– Can only reveal shared addresses– Still restrict global pointer operations

5

y =

3

Recap of Enforcement Strategies

• Early enforcement– Shared universe is transitively closed– Global pointer restrictions trivially satisfied

5

y =

3

Type Inference:Constraint Generation

• Type structure already known– Including local / global

• Induce constraints on sharing qualifiersδ = shared from global deref / assignδ ≤ δ′ from assignmentsδ = δ′ from various other

operations

• Stricter enforcement adds more constraintsδ = shared δ′ = shared

Type Inference:Constraint Resolution

• Given constraints:private ≤ δ1 δ ≤ δ1

shared ≤ δ2 δ ≤ δ2

private sharedδ

δ1 δ2

Type Inference:Constraint Resolution

• Two “minimal” solutionsδ = shared δ1 = mixed δ2 = shared

private sharedδ = shared

δ1 = mixed δ2 = shared

Type Inference:Constraint Resolution

• Two “minimal” solutionsδ = shared δ1 = mixed δ2 = shared

δ = private δ1 = private δ2 = mixed

private sharedδ = private

δ1 = private δ2 = mixed

Type Inference:Biased Constraint Resolution

1. Push “shared” and “mixed” forward

private sharedδ

δ1 shared ≤ δ2

Type Inference:Biased Constraint Resolution

1. Push “shared” and “mixed” forward

2. Identify qualifiers which cannot be private

private sharedδ

δ1 shared ≤ δ2

Type Inference:Biased Constraint Resolution

1. Push “shared” and “mixed” forward 2. Identify qualifiers which cannot be private3. Set all other qualifiers to private

private sharedδ = private

δ1 = private shared ≤ δ2

Type Inference:Biased Constraint Resolution

2. Identify qualifiers which cannot be private 3. Set all other qualifiers to private4. Push “private” forward

private sharedδ = private

δ1 = privateshared ≤ δ2

private ≤ δ2

Type Inference:Biased Constraint Resolution

3. Set all other qualifiers to private4. Push “private” forward5. Set remaining qualifiers to “shared” or “mixed”

private sharedδ = private

δ1 = private δ2 = mixed

Implementation For Titanium

• Java + extensions– Objects, classes, interfaces, methods– Multidimensional arrays, templates– Local / global, communications primitives

• Sharing validation as type checking• Sharing inference as compiler analysis

– Late or early enforcement– Whole-program or partial

Experimental Findings:Static Metrics

• How much data is “private”?– 16% - 75% of all static declaration sites– 46% overall; 50% on largest benchmark

• Is “mixed” really needed?– Up to 6% of static sites, but large impact– Some utility code: could use parametric poly

Experimental Findings:Static Metrics

• Why have “local shared”?– 24% - 53% of shared data is locally addressed– Bad idea to force these to global

• Does enforcement policy affect results?– No change for small benchmarks (<1000 lines)– 1% - 4% shift for larger codes

Experimental Findings: Consistency Model Relaxation

• Impose sequentially consistent semantics– Restrict both Titanium & C optimizers– Relax restrictions for private data

• Performance impact varies widely– Negligible sequential slowdown: nothing to do– Sequential slowdown, offset by inference– Sequential slowdown, better inference needed

Experimental Findings:Other Dynamic Metrics

• Data location management– 1% - 100% of allocated bytes are private

• 45% in gas benchmark

– amr: highly sensitive to enforcement policy• 74% late / 19% early

• Synchronization elimination– Statically, one third eliminated– Dynamically, not significant for these codes

Summary

• “Shared” might not mean what you think– Related to local/global, but not the same

– Different degrees of privacy to choose from• Escape analysis, or several weaker alternatives

– Generalizes on earlier language designs

• Experimental implementation – Ideas & algorithms scale to real system

– More aggressive clients needed

– Potential for stronger (phase-aware) inference