proposed solutions

Proposed Solutions

SLICC: Collective CachesMulti-core – Spread in “Space”

STREX: Stratified ExecutionSingle-Core – Spread in “Time”

Prob

lem

& O

ppor

tuni

tyK

ey Results

OLTP Execution: Stratified or Collective?Islam Atta1, Pınar Tözün2, Xin Tong1, Anastasia Ailamaki2, Andreas Moshovos1

1University of Toronto, 2École Polytechnique Fédérale de Lausanne

fits in aggregate L1-I capacity of a CMP

A transaction's Instruction footprint

Foot

prin

t

L1-I size

Bet

ter

L1-I size

Foot

prin

tEach

Email: [email protected]: 416-805-8790Website: http://islamatta.com

Many concurrent Transactions

TPC-C TPC-E

Instruction Overlap

0

1

2

3

CORES

Conventional

SLICC

Dividedwe Fail

United we Succeed

A A B A C B C

A A A A B B C C C

Transaction A Transaction B Transaction C

Cache Thrashing Overhead

Time

STREX

ThroughputL1 Instruction Misses

Opportunity – CMP Integration

Payment

IT(CUST)

R(DIST)

R(CUST)

U(CUST)

U(DIST)

U(WH)

I(HIST)

R(WH) 41.4KB

40.5KB

41.8KB

39.1KB

29.9KB

28.7KB

28.7KB

47.4KB

New Order

R(DIST)

I(NORD)

R(WH)

U(DIST)

R(CUST)

R(ITEM)

R(STO)

U(STO)

I(OL)

I(ORD)

Loop (OL_CNT)

41.5KB

40.5KB

40.5KB28.8KB

39.6KB

40.2KB

65.3KB

29.4KB

41.5KB

41.5KB

B

AB C

1T1Leader Transaction

Phase # 2 3 4 5T1 T1 T1 T2

A A A A

B BB

A B

A C

B C

STREX SLICCBaseline HYBRID

TPC-C TPC-E0%

20%

40%

60%

80%

100%

BusyOther StallsInstruction Stalls

Exec

ution

Cyc

les B

reak

dow

n

TPC-C

TPC-E

0 1 2 3 4IPC

Transaction B Transaction CTransaction AExample Transaction Control Flow

Possible execution flows; Significant Overlap

L1 Data MissesOperation Overlap

Intel Xeon X5660 4-way Issue

Ideal

Core Cycles Wasted!

Instruction Stalls

Dominate

9

3

10

4

Cache

Refill Coun

t

2-core

4-core

8-core

16-core

2-core

4-core

8-core

16-core

TPC-C TPC-E

01234567

Rela

tive

Thro

ughp

ut

2 co

res

4 co

res

8 co

res

16 co

res

2 co

res

4 co

res

8 co

res

16 co

res

TPC-C TPC-E

05

10152025303540

D-M

PKI

2 co

res

4 co

res

8 co

res

16 co

res

2 co

res

4 co

res

8 co

res

16 co

res

TPC-C TPC-E

05

10152025303540

I-MPK

I

A B AC

C C C

0

1

2

Conventional

OLTP Micro-Architectural Evaluation

Instruction Caches are Thrashed

Why Instruction Stalls?

Opportunity – Inter-Transaction Behavior

Methodology

Simulator: x86 CMP

CPU: Out-of-Order, 2.5 GHz

L1-I/D: Private, 32 KB

L2: Unified, 1MB per core

Memory: DDR3, 1.6 GHz

Storage Manager: Shore-MT

TPC-C: 10 warehouse, 1GB

TPC-E: 1000 clients, 20 GB

Bet

ter

HYBRID: STREX + SLICCDynamically Selects the Better Scheduler

Measure Dynamic Instruction Footprint

Runtime Aggregate Cache

Capacity

STREX

SLICC

Compare

Threads Migrate Chasing Locality

proposed solutions

Documents

thrashedwhy instruction

space strex

strex sliccdynamically

xin tong1

way issueidealcore cycles

cole polytechnique fdrale

anastasia ailamaki2

pnar tzn2