proposed solutions

1
P r o p o s e d S o l u t i o n s SLICC: Collective Caches Multi-core – Spread in “Space” STREX: Stratified Execution Single-Core – Spread in “Time” Problem & Opportunity K e y R e s u l t s OLTP Execution: Stratified or Collective? Islam Atta 1 , Pınar Tözün 2 , Xin Tong 1 , Anastasia Ailamaki 2 , Andreas Moshovos 1 1 University of Toronto, 2 École Polytechnique Fédérale de Lausanne fits in aggregate L1-I capacity of a CMP A transaction's Instruction footprint Footprin t L1-I size Better L1-I size Footprin t Each Email: [email protected] Phone: 416-805-8790 Website: http://islamatta.com Many concurrent Transactions TPC-C TPC-E Instruction Overlap 0 1 2 3 CORES Conventiona l SLICC Divided we Fail United we Succeed A A B A C B C A A A A B B C C C Transaction A Transaction B Transaction C Cache Thrashing Overhead Time STREX Throughput L1 Instruction Misses Opportunity – CMP Integration Payment IT(CUST) R(DIST) R(CUST) U(CUST) U(DIST) U(WH) I(HIST) R(WH) 41.4KB 40.5KB 41.8KB 39.1KB 29.9KB 28.7KB 28.7KB 47.4KB New Order R(DIST) I(NORD) R(WH) U(DIST) R(CUST) R(ITEM) R(STO) U(STO) I(OL) I(ORD) L o o p ( O L _ C N T ) 41.5KB 40.5KB 40.5KB 28.8KB 39.6KB 40.2KB 65.3KB 29.4KB 41.5KB 41.5KB B A B C 1 T1 Leader Transaction Phase # 2 3 4 5 T1 T1 T1 T2 A A A A B B B A B A C B C STREX SLICC Baseline HYBRID T T 0% 20% 40% 60% 80% 100% Busy Other Stalls Execution Cycles Breakdown TPC-C TPC-E 0 1 2 3 4 IPC Transaction B Transaction C Transaction A Example Transaction Control Flow Possible execution flows; Significant Overlap L1 Data Misses Operation Overlap Intel Xeon X5660 4-way Issue Ideal Core Cycles Wasted! Instructi on Stalls Dominate 9 3 1 0 4 Cache Refil l Count 2-core 4-core 8-core 16-core 2-core 4-core 8-core 16-core TPC-C TPC-E 0 1 2 3 4 5 6 7 Relative Throughput 2 cores 4 cores 8 cores 16 cores 2 cores 4 cores 8 cores 16 cores TPC-C TPC-E 0 10 20 30 40 D-MPKI 2 cores 4 cores 8 cores 16 cores 2 cores 4 cores 8 cores 16 cores TPC-C TPC-E 0 10 20 30 40 I-MPKI A B A C C C C 0 1 2 Conventiona l OLTP Micro-Architectural Evaluation Instruction Caches are Thrashed Why Instruction Stalls? Opportunity – Inter-Transaction Behavior Methodology Simulator: x86 CMP CPU: Out-of-Order, 2.5 GHz L1-I/D: Private, 32 KB L2: Unified, 1MB per core Memory: DDR3, 1.6 GHz Storage Manager: Shore-MT TPC-C: 10 warehouse, 1GB TPC-E: 1000 clients, 20 GB Better HYBRID: STREX + SLICC Dynamically Selects the Better Scheduler Measure Dynamic Instruction Footprint Runtime Aggregate Cache Capacity STREX SLICC Compare Threads Migrate Chasing Locality

Upload: elan

Post on 23-Feb-2016

18 views

Category:

Documents


0 download

DESCRIPTION

OLTP Execution: Stratified or Collective?. Transaction B. Transaction C. Transaction A. Islam Atta 1 , Pınar Tözün 2 , Xin Tong 1 , Anastasia Ailamaki 2 , Andreas Moshovos 1 1 University of Toronto, 2 École Polytechnique Fédérale de Lausanne. Better. Better. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proposed Solutions

Proposed Solutions

SLICC: Collective CachesMulti-core – Spread in “Space”

STREX: Stratified ExecutionSingle-Core – Spread in “Time”

Prob

lem

& O

ppor

tuni

tyK

ey Results

OLTP Execution: Stratified or Collective?Islam Atta1, Pınar Tözün2, Xin Tong1, Anastasia Ailamaki2, Andreas Moshovos1

1University of Toronto, 2École Polytechnique Fédérale de Lausanne

fits in aggregate L1-I capacity of a CMP

A transaction's Instruction footprint

Foot

prin

t

L1-I size

Bet

ter

L1-I size

Foot

prin

tEach

Email: [email protected]: 416-805-8790Website: http://islamatta.com

Many concurrent Transactions

TPC-C TPC-E

Instruction Overlap

0

1

2

3

CORES

Conventional

SLICC

Dividedwe Fail

United we Succeed

A A B A C B C

A A A A B B C C C

Transaction A Transaction B Transaction C

Cache Thrashing Overhead

Time

STREX

ThroughputL1 Instruction Misses

Opportunity – CMP Integration

Payment

IT(CUST)

R(DIST)

R(CUST)

U(CUST)

U(DIST)

U(WH)

I(HIST)

R(WH) 41.4KB

40.5KB

41.8KB

39.1KB

29.9KB

28.7KB

28.7KB

47.4KB

New Order

R(DIST)

I(NORD)

R(WH)

U(DIST)

R(CUST)

R(ITEM)

R(STO)

U(STO)

I(OL)

I(ORD)

Loop (OL_CNT)

41.5KB

40.5KB

40.5KB28.8KB

39.6KB

40.2KB

65.3KB

29.4KB

41.5KB

41.5KB

B

AB C

1T1Leader Transaction

Phase # 2 3 4 5T1 T1 T1 T2

A A A A

B BB

A B

A C

B C

STREX SLICCBaseline HYBRID

TPC-C TPC-E0%

20%

40%

60%

80%

100%

BusyOther StallsInstruction Stalls

Exec

ution

Cyc

les B

reak

dow

n

TPC-C

TPC-E

0 1 2 3 4IPC

Transaction B Transaction CTransaction AExample Transaction Control Flow

Possible execution flows; Significant Overlap

L1 Data MissesOperation Overlap

Intel Xeon X5660 4-way Issue

Ideal

Core Cycles Wasted!

Instruction Stalls

Dominate

9

3

10

4

Cache

Refill Coun

t

2-core

4-core

8-core

16-core

2-core

4-core

8-core

16-core

TPC-C TPC-E

01234567

Rela

tive

Thro

ughp

ut

2 co

res

4 co

res

8 co

res

16 co

res

2 co

res

4 co

res

8 co

res

16 co

res

TPC-C TPC-E

05

10152025303540

D-M

PKI

2 co

res

4 co

res

8 co

res

16 co

res

2 co

res

4 co

res

8 co

res

16 co

res

TPC-C TPC-E

05

10152025303540

I-MPK

I

A B AC

C C C

0

1

2

Conventional

OLTP Micro-Architectural Evaluation

Instruction Caches are Thrashed

Why Instruction Stalls?

Opportunity – Inter-Transaction Behavior

Methodology

Simulator: x86 CMP

CPU: Out-of-Order, 2.5 GHz

L1-I/D: Private, 32 KB

L2: Unified, 1MB per core

Memory: DDR3, 1.6 GHz

Storage Manager: Shore-MT

TPC-C: 10 warehouse, 1GB

TPC-E: 1000 clients, 20 GB

Bet

ter

HYBRID: STREX + SLICCDynamically Selects the Better Scheduler

Measure Dynamic Instruction Footprint

Runtime Aggregate Cache

Capacity

STREX

SLICC

Compare

Threads Migrate Chasing Locality