consistency oblivious programming

48
Consistency Oblivious Programming Hillel Avni Tel Aviv University

Upload: pia

Post on 12-Jan-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Consistency Oblivious Programming. Hillel Avni Tel Aviv University. Agenda. Transactional Memory and Locking Consistency Oblivious Programming (COP) COP with STM COP With HTM Future Work. 2. Global Lock. Easy to use Composable - Concatenate critical sections Not scalable. 3. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Consistency Oblivious Programming

Consistency Oblivious Programming

Hillel AvniTel Aviv University

Page 2: Consistency Oblivious Programming

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

2

Page 3: Consistency Oblivious Programming

Global Lock

Easy to use

Composable - Concatenate critical sections

Not scalable

3

Page 4: Consistency Oblivious Programming

Fine Grain Locking

Hard to use

Not Composable

Scalable

Lazy linked list is a good example…

4

Page 5: Consistency Oblivious Programming

Lazy Traversal

b d ea

add(c) Aha!

5

Page 6: Consistency Oblivious Programming

Lock and Validate

b d ea

add(c) Yes, b still points to d

6

Page 7: Consistency Oblivious Programming

Perform Updates and Release Locks

b d ea

add(c)

c

7

Page 8: Consistency Oblivious Programming

Transactional Memory

Easy to use

Composable

Scalable

How is it done?

8

Page 9: Consistency Oblivious Programming

9

Java (Duece)bool CAS(int location, int expected, int new val){ atomic { if (location != expected) return false; location = new val; } return true;}

Page 10: Consistency Oblivious Programming

10

bool CAS(int location, int expected, int new val){ __transaction_atomic { if (location != expected) return false; location = new val; } return true;}

C/C++ (GCC-4.7)

Page 11: Consistency Oblivious Programming

1111

Software Transactional Memory

Different algorithms are used. Different algorithms are used.

consistency checkingconsistency checking

rollbackrollback

Compiler recognizes shared accesses.

Compiler recognizes shared accesses.

Page 12: Consistency Oblivious Programming

STM Problem - Overheadtemplate <typename V> static V load(const V* addr, ls_modifier mod)

{

if (unlikely(mod == RfW))

{

pre_write(addr, sizeof(V));

return *addr;

}

if (unlikely(mod == RaW))

return *addr;

gtm_thread *tx = gtm_thr();

gtm_rwlog_entry* log = pre_load(tx, addr, sizeof(V));

V v = *addr;

atomic_thread_fence(memory_order_acquire);

post_load(tx, log);

return v;

}

load function from GCC 4.8.1load function from GCC 4.8.1

12

Page 13: Consistency Oblivious Programming

STM Problem - Overhead static gtm_rwlog_entry* pre_load(gtm_thread *tx, const void* addr, size_t len)

{

size_t log_start = tx->readlog.size();

gtm_word snapshot = tx->shared_state.load(memory_order_relaxed);

gtm_word locked_by_tx = ml_mg::set_locked(tx);

size_t orec = ml_mg::get_orec(addr);

size_t orec_end = ml_mg::get_orec_end(addr, len);

do

{

gtm_word o = o_ml_mg.orecs[orec].load(memory_order_acquire);

if (likely (!ml_mg::is_more_recent_or_locked(o, snapshot))) {

success:

gtm_rwlog_entry *e = tx->readlog.push();

e->orec = o_ml_mg.orecs + orec; e->value = o;

}

else if (!ml_mg::is_locked(o)) {snapshot = extend(tx); goto success; } else {

if (o != locked_by_tx)

tx->restart(RESTART_LOCKED_READ);}

orec = o_ml_mg.get_next_orec(orec); }

while (orec != orec_end);

return &tx->readlog[log_start];

}

load always call pre_loadload always call pre_load

13

Page 14: Consistency Oblivious Programming

STM Problem - Overhead

static void post_load(gtm_thread *tx, gtm_rwlog_entry* log)

{

for (gtm_rwlog_entry *end = tx->readlog.end(); log != end; log++)

{

gtm_word o = log->orec->load(memory_order_relaxed);

if (log->value != o)

tx->restart(RESTART_VALIDATE_READ);

}

} and post_loadand post_load

Compare to mov eax, [ebx]on x86

Compare to mov eax, [ebx]on x86

14

Page 15: Consistency Oblivious Programming

1515

Hardware Transactional Memory

Exploit native cache coherenceExploit native cache coherence

consistency checkingconsistency checking

rollbackrollback

Page 16: Consistency Oblivious Programming

1616

HTM Problem – Resources

limitslimits

cache size limits data footprintcache size limits data footprint

A transaction cannot commit if it isA transaction cannot commit if it is

too bigtoo big

too slowtoo slow

quantum size limits durationquantum size limits duration

Page 17: Consistency Oblivious Programming

1717

All TM Problem – False Conflicts

Any address that was encountered during the transaction is monitored until the endof that transaction.

An address may abort a transaction long After it is not relevant…

Any address that was encountered during the transaction is monitored until the endof that transaction.

An address may abort a transaction long After it is not relevant…

Page 18: Consistency Oblivious Programming

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

18

Page 19: Consistency Oblivious Programming

COP Operation

• In non transactional mode:– Execute the read-only prefix of the

operation and record its output.

• In transactional mode:– Verify output is correct.– Perform updates.

19

Page 20: Consistency Oblivious Programming

COP Example – RB Tree

20

3010

27 40

2528

20

Page 21: Consistency Oblivious Programming

Add 26 – Tree Unbalanced

20

3010

40

TM Search 26TM Search 26

27

2528

2621

Page 22: Consistency Oblivious Programming

Tree Balanced

27

3020

2510

2840

26

TM Search continues from 27TM Search continues from 27

Conflict and AbortConflict and Abort

22

Page 23: Consistency Oblivious Programming

Add 26 – Tree Unbalanced

20

3010

40

COP Search 26COP Search 26

27

2528

2623

Page 24: Consistency Oblivious Programming

Tree Balanced

27

3020

2510

2840

26

TM Search continues from 27TM Search continues from 27

FoundFound

24

Page 25: Consistency Oblivious Programming

COP RB-Tree VerifyTo facilitate verification:

• all nodes in the RB-Tree are connected in a successor-predecessor doubly linked list, and each node has a live mark.

• Search returns a node n with k or a leaf with k’s successor or predecessor.

25

Page 26: Consistency Oblivious Programming

COP RB-Tree Suffix• Resume a transaction

• Verify:– k found and n is live – done.– K not found, check:

• (n.k>k>n.pred.k && !n.right) or (n.k<k<n.succ.k && !n.left)

• If verification failed – abort the transaction.

• Complete updates, add / remove / rebalance, using n.

26

Page 27: Consistency Oblivious Programming

COP Template for opstart-transaction

any-code

suspend-transaction

output = op-rop();

resume-transaction

If(not(op-verify(output)))

abort-transaction

op-complete(output)

any-code

end-transaction

27

Page 28: Consistency Oblivious Programming

COP CorrectnessThe underlying TM:• Transactional Regular Registers

The COP algorithm:• Obliviousness• Verifiability• Separation

We prove that if the TM yields transactional regular registers, and the COP algorithm demonstrates obliviousness, verifiability, and separation, than the COP operation is linearizeable.

28

Page 29: Consistency Oblivious Programming

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

29

Page 30: Consistency Oblivious Programming

STM Algorithm• GCC default STM algorithm is the one that proved to

be the most efficient and scalable in most scenarios:– Write Through (WT)– Encounter Time Locking (ETL)– Multi Lock (ML)

30

Page 31: Consistency Oblivious Programming

STM: WT – ETL - ML

1. RV Shared Version Clock2. On Read: check unlocked and

v# <= RV then add to read-Set3. On write: check v# <= RV, lock,

and add to undo-Set4. WV = F&I(VClock)5. Validate that in the read-set

each v# <= RV6. Release locks with v# WV

100 Shared Version Clock

87 0 87 0

34 0

88 0

44 0

V# 0

34 0

99 0 99 0

50 0 50 0

Mem Locks

87 0

34 0

99 0

50 0

34 1

99 1

87 0

X

Y

Commit

121 0

121 0

50 0

87 0

121 0

88 0

V# 0

44 0

V# 0

121 0

50 0

100 RV

100120121

X

Y

31

Page 32: Consistency Oblivious Programming

GCC Constructs__transaction_atomic{}: Mark the transaction.

__transaction_cancel: Explicit abort.

__attribute__((transaction_safe)): Instrument the code.

__attribute__((transaction_pure)):

Do not instrument the code. We will show this attribute can be used efficiently as __transaction_suspend with WT – ETL – ML default STM algorithm in GCC.

32

Page 33: Consistency Oblivious Programming

pure = suspend • Transactional Regular Registers – All values upto

one architecture-word size are written and read atomically. The rollback may use memcpy, but the memcpy is optimized to write maximal alignment.

• Now we will compare the future Power architecture HTM suspended mode, to transaction_pure with WT-ETL-ML STM algorithm.

33

Page 34: Consistency Oblivious Programming

Power tsuspend - tresume1. Until failure occurs, load instructions that access

memory locations that were transactionally written by the same thread will return the transactionally written data.

2. In the event of transaction failure, failure recording is performed, but failure handling is deferred until transactional execution is resumed.

3. The initiation of a new transaction is prevented.

4. Store instructions that access memory locations that have been accessed transactionally (due to load or store) by the same thread will cause the transaction to fail.

34

Page 35: Consistency Oblivious Programming

RB – 1M sz – 20%U - 10 op/tx

35

Page 36: Consistency Oblivious Programming

RB – 1K sz – 8 Threads – 20% U

36

Page 37: Consistency Oblivious Programming

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

37

Page 38: Consistency Oblivious Programming

Haswell HTM with COPThere is no suspend mode, so to compose COP

operations, we execute all ROP before the transaction. This limits the composition to one writing COP operation in a transaction at most.

38

Page 39: Consistency Oblivious Programming

Capacity and Cache AssociativityPacked Memory Array (PMA) search is done by divide

and conquer. Assume a PMA size is 0x800000, and it starts at address 0. A searches for an item that is found in address 0x0…0x7FFF, must go through the addresses:

0x400000 0x20000 0x100000 0x80000

0x40000 0x20000 0x10000 0x8000

As cache size in Haswell is 0x8000, all these addresses have the same cache index (0), and will always abort.

39

Page 40: Consistency Oblivious Programming

PMA

40

Page 41: Consistency Oblivious Programming

RB-Tree Capacity Aborts

41

Page 42: Consistency Oblivious Programming

RB-Tree Conflict Aborts

42

Page 43: Consistency Oblivious Programming

Agenda Transactional Memory and Locking

Consistency Oblivious Programming (COP)

COP with STM

COP With HTM

Future Work

43

Page 44: Consistency Oblivious Programming

Data StructuresWe already have COP versions of:• RB-Tree• Linked list• PMA• Cache Oblivious B-Tree• Leaplist (k-ary skip list, tailored for range queries)

Can we design more COP data structures?

44

Page 45: Consistency Oblivious Programming

ApplicationsUse COP in applications.

Many applications use shared data structures, so it is interesting to see the impact of COP on their performance.

45

Page 46: Consistency Oblivious Programming

InfrastructureAdd statistics (transactional accesses, conflicts) to GCC.

Add real suspend-mode to GCC, hardware.

46

Page 47: Consistency Oblivious Programming

TheoryHow to make transformation to COP automatic?

Is COP applicable outside the data-structures area?

Bounds on the amount of transactional accesses?

Bounds on the amount of false conflicts?

47

Page 48: Consistency Oblivious Programming

Thank You