1 u niversity of m assachusetts, a mherst school of computer science p redator : predictive false...

36
1 UNIVERSITY OF MASSACHUSETTS, AMHERST School of Computer Science PREDATOR: Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu, Emery Berger* *University of Massachusetts Amherst Huawei US Research Center

Upload: sabina-suzanna-reeves

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

1 UNIVERSITY OF MASSACHUSETTS, AMHERST • School of Computer Science

PREDATOR: Predictive False Sharing Detection

Tongping Liu*, Chen Tian, Ziang Hu, Emery Berger*

*University of Massachusetts AmherstHuawei US Research Center

Page 2: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

2

Parallelism: Expectation is Awesome

Ru

nti

me (

s)

1 2 4 80

10

20

30

40

50

60

70

80

90

Number of threads

Expectation

Parallel Program

int count[8];int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++;}

int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i);}

Page 3: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

3

1 2 4 80

20

40

60

80

100

120

140

Number of threads

False sharing slows the program by 13X

Ru

nti

me (

s)

Parallel Program Expectation

Reality

Parallelism: Reality is Awful

int count[8];int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++;}

int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i);}

False sharing

Page 4: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

4

False Sharing in Real Applications

False sharing slows MySQL by 50%

Page 5: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

5

Cache Line

False Sharing vs. True Sharing

Page 6: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

6

Task 3Task 1

Task 2 Task 4

False Sharing

Task 1

TrueSharing

Task 2

False Sharing vs. True Sharing

Page 7: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

7

Resource Contention at Cache Line Level

Page 8: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

8

Thread 1

Main Memory

Core 1

Thread 2

Core 2

Cache

Cache

Invalidate

Cache line: basic unit of data transfer

False Sharing Causes Performance Problems

Page 9: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

9

Thread 1 Thread 2

Cache

Cache

Invalidate

Interleaved accesses cause cache invalidations

Main Memory

Core 1 Core 2

False Sharing Causes Performance Problems

Page 10: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

10

me = 1;you = 1; // globals

me = new Foo;you = new Bar; // heap

class X { int me; int you;}; // fields

array[me] = 12;array[you] = 13; // array indices

False Sharing is Everywhere

Page 11: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

11

False Sharing is Hard to Diagnose

Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

Page 12: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

12

Problems of Existing Tools

• No precise information/false positives–WIBA’09, VEE’11, EuroSys’13, SC’13

• Accurate & Precise– OOPSLA’11 ( Cannot detect read-write

FS)Shared problem: only detect observed false sharing

Page 13: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

13

Task 1

Task 2

Cache

Cache

Invalidate

Main Memory

Core 1 Core 2

False Sharing Causes Performance Problems

Find cache lines with many cache invalidations

Interleaved accesses

Cache invalidations

Performance problems

Detect false sharing causing performance problems

Page 14: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

14

Find Lines with Many Invalidations

. . . . . . .

……

Track cache invalidations on each cache line

Memory: Global, Heap

Page 15: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

15

Track Cache Invalidations

• Hardware-based approach– Needs hardware

support– No portability

• Simulation-based approach– Needs hardware info

such as cache hierarchy, cache capacity

– Very slow

• Conservative Assumptions– Each thread runs on a

different core with its private cache.

– Infinite cache capacity.

PREDATOR: based on memory access history of each cache line

Page 16: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

16

Track Cache Invalidations

r w r ww r w r

T1 T2

0

# of invalidations

12

Time

30 0 0 0T2 r T1 rT2 w

Each Entry: { Thread ID, Access Type}

T2 w 0 0T1 wT2 w 0 0T1 r

Page 17: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

17

PREDATOR Components

Compiler Instrumenta

tion

Runtime System

Instruments every memory read/write

access

Collects memory accesses and reports

false sharing

Page 18: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

18

Detect Problems Correctly & Precisely

• Correctly: –No false

alarms

Task 3Task 1

Task 2 Task 4

False Sharing

Task 1

TrueSharing

Task 2

Track memory accesses on each word

• Precisely– Global variables–Heap objects: pinpoint the line of memory

allocation

Page 19: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

19

PREDATOR’s Report

Page 20: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

20

Why do we need prediction?

Page 21: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

21

Necessity of False Sharing Prediction

Thread 1 Thread 2

Cache line 1 Cache line 2

Cache line 1 Cache line 2

False Sharin

g

Cache line 1

False Sharin

g

Page 22: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

22

Properties Affecting False Sharing Occurrence

32-bit platform 64-bit platformDifferent memory allocatorDifferent compiler or optimizationDifferent allocation order by changing the

code, e.g., printf

• Change of memory layout

• Run on hardware with different cache line size

Page 23: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

23

Example of False Sharing Sensitivity

Offset = 0

Offset = 8

Offset = 56

……

Memory

Colors represent threads

Cache line size = 64 bytes

Page 24: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

24

Offse

t=0

Offse

t=8

Offse

t=16

Offse

t=24

Offse

t=32

Offse

t=40

Offse

t=48

Offse

t=56

0

1

2

3

4

5

6

Ru

nti

me (

Secon

ds)

PREDATOR predicts false sharing

problems without occurrence

Example of False Sharing Sensitivity

Page 25: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

25

Prediction Based on Virtual Cache Lines

Thread 1 Thread 2

Cache line 1 Cache line 2

Virtual cache line 1 Virtual cache line 2

False Sharin

g

Virtual cache line 1

False Sharin

g

Real case

Prediction 1

Prediction 2

Page 26: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

26

d YX

(sz-d)/2 (sz-d)/2

Tracked virtual line

Non-tracked virtual lines

Track Invalidations on Virtual Cache Lines

d < the cache line size - sz(X, Y) from different threads && one of

them is write

Page 27: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

27

Benchmark Results

Benchmarks Unknown Problem

Without Prediction

With Prediction Improvement

Histogram ✔ ✔ ✔ 46%

Linear_regression ✔ 1207%

Reverse_index ✔ ✔ 0.09%

Word_count ✔ ✔ 0.14%

Streamcluster-1 ✔ ✔ ✔ 4.77%

Streamcluster-2 ✔ ✔ 7.52%

Page 28: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

28

Real Applications Results

• MySQL– Problem: False sharing occurs when

different threads update the shared bitmap simultaneously.

– Performance improves 180% after fixes.

• Boost library:– Problem: “there will be 16 spinlocks per

cache line”– Performance improves about 100%.

Page 29: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

29

Performance Overhead of PREDATOR

Phoenix

histogra

m

kmea

ns

linear_

regressi

on

matrix_

multiply pca

revers

e_index

string_m

atch

word_c

ount

PARSEC

blacks

choles

bodytrac

k

dedup

ferret

fluidanimate

strea

mcluste

r

swap

tions x2

64

RealApplica

tionsag

etBoost

Mem

cach

ed

MyS

QL

pbzip2

pfscan

AVERAGE

0

3

6

9

12

15

Execution Time Overhead

Original

PREDATOR-NP

PREDATOR

Nor

mal

ized

Runti

me

2326

5.6X

Page 30: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

30

Compiler Instrumenta

tionRuntime System

Thread 1

Thread 2

Cache

Cache

Invalidate

Main Memory

Core 1 Core 2

Precise report

Thread 1 Thread 2

Cache line 1 Cache line 2

Virtual cache line 1Virtual cache line 2

False Sharin

g

Virtual cache line 1

False Sharin

g

Real case

Prediction 1

Prediction 2

Page 31: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

31

Page 32: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

32

False Sharing is Hard to Diagnose

Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

Page 33: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

33

Detailed Prediction Algorithm

1. Find suspected cache lines

Page 34: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

34

Detailed Prediction Algorithm

1. Find suspected cache lines

2. Track detailed memory accesses

Page 35: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

35

Detailed Prediction Algorithm

1. Find suspected cache lines

2. Track detailed memory accesses

3. Predict based on hot accesses

YX

d d < sz && (X, Y) from different

threads, potential false sharing

Page 36: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

36

4: Tracking Cache Invalidations on the Virtual Line

d YX

(sz-d)/2 (sz-d)/2

Tracked virtual line

Non-tracked virtual lines