embracing heterogeneity with dynamic core boosting

22
University of Michigan Electrical Engineering and Computer Science 1 Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke University of Michigan May 20, 2014

Upload: clover

Post on 16-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Embracing Heterogeneity with Dynamic Core Boosting. Hyoun Kyu Cho and Scott Mahlke. University of Michigan. May 20, 2014. Parallel Programming. Core1. Core2. Workload. Core3. Core4. Workload Imbalance Among Threads. Asymmetric S/W Control flow divergence - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science1

Embracing Heterogeneity with Dynamic Core Boosting

Hyoun Kyu Cho and Scott Mahlke

University of Michigan

May 20, 2014

Page 2: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science2

Parallel Programming

Core1

Core2

Core3

Core4

Workload

Page 3: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science3

Workload Imbalance Among Threads

• Asymmetric S/W– Control flow divergence– Non-deterministic memory

latencies– Synchronization operations

• Asymmetric H/W– Heterogeneous multicores– Core-to-core process variation

Page 4: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science4

Performance Impact of Asymmetric H/W

• Symmetric 8 Cores vs. 8 Cores w/ variations

Page 5: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science5

CPU Time Wasted for SynchronizationHomogeneous Heterogeneous

Page 6: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science6

Thread Criticality due to Workload Imbalance

T1

T2

T3

T4

T5

IdleBarrier

time

T1

T2

T3

T4

T5time

Page 7: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science7

Accelerating Critical Path w/ Core Boosting

T1

T2

T3

T4

T5

IdleBarrier

time

T1

T2

T3

T4

T5time

T1

T2

T3

T4

T5time

Page 8: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science8

Modeling Workload Imbalance & Boosting

Page 9: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science9

Boosting Assignment• Data parallel programs

• Pipeline parallel programsWorkerWorker Worker Worker Worker

Stage1 Stage2 Stage3 Stage4

Page 10: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science10

Boosting Data Parallel Programs• Greedy scheduling

Page 11: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science11

Boosting Pipeline Parallel Programs• Epoch-based scheduling

– Monitors CPU utilization with H/W performance counter– Assigns boosting budget at the end of epoch

Page 12: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science12

Dynamic Core Boosting

Page 13: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science13

Progress Monitoring Example … pthread_barrier_wait(barrier); period = calc_period_LID_007(start, end); for ( i = start ; i < end ; i++ ) { … compute(…); if ( side_exit ) { SET_PROGRESS_TO(MAX_PROGRESS_007); break; } if ( ( ( end – i ) % period ) == 0 ) PROGRESS_STEP_FORWARD; } pthread_barrier_wait(barrier); …

Page 14: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science14

Evaluation Methodology• Asymmetry emulation with Dynamic Binary Translation

– Slow down proportionally instead of accelerating• 8 cores with frequency variation

– • 1 core boosted, boosting rate = 1.5x• Compares

– Heterogeneous– Reactive– DCB

Page 15: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science15

Performance Improvementbla

cksc

holes

body

track

cann

eal

dedu

pfa

cesim ferre

tflu

idanim

ate

raytr

ace

strea

mcluste

rsw

aptio

nsx2

64g.

mean

0.5

0.6

0.7

0.8

0.9

1.0Heterogeneous Reactive DCB

Norm

aliz

ed E

xecu

tion

Tim

e

Page 16: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science16

Synchronization Overheadsbl

acks

chol

esbo

dytra

ckca

nnea

lde

dup

face

sim

ferre

tflu

idan

imat

era

ytra

cest

ream

clus

ter

swap

tions

x264

g.m

ean

0%10%20%30%40%50%60%70%80%

Heterogeneous Reactive DCB

Rel

ativ

e C

PU T

ime

Page 17: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science17

Thread Arrival Time

Page 18: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science18

Conclusion• DCB mitigates workload imbalance in performance

asymmetric CMPs– Accelerating critical threads– Coordinating compiler, runtime, and architecture for

near-optimal assignment

• Overall, improves performance by 33%, outperforming a reactive boosting scheme by 10%

Page 19: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science19

Thank you!

Page 20: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science20

Core Boosting with Frequency Scaling

Transition time < 10ns [Dreslinski`12]

Page 21: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science21

Asymmetry Emulation with DBT

Page 22: Embracing Heterogeneity with Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science22

Evaluation Platform Accuracybl

acks

chol

esbo

dytra

ckca

nnea

lde

dup

face

sim

ferre

tflu

idan

imat

era

ytra

cest

ream

clus

ter

swap

tions

x264

mea

n

0%

2%

4%

6%

8%

10%

12%

Rel

ativ

e Er

ror