hardware-based job queue management for manycore architectures and openmp environments

25
Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments Junghee Lee, Chrysostomos Nicopoulos, Yongjae Lee, Hyung Gyu Lee and Jongman Kim Presented by Junghee Lee

Upload: emory

Post on 09-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments. Junghee Lee, Chrysostomos Nicopoulos , Yongjae Lee, Hyung Gyu Lee and Jongman Kim. Presented by Junghee Lee. Introduction. Manycore systems Number of cores is increasing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

Hardware-based Job Queue Management for Manycore Architectures and OpenMP

EnvironmentsJunghee Lee, Chrysostomos Nicopoulos, Yongjae

Lee, Hyung Gyu Lee and Jongman Kim

Presented by Junghee Lee

Page 2: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

2

Introduction

• Manycore systems– Number of cores is increasing

• Challenges in scalability– Memory– Power consumption– Cache coherence protocol– Load balancing

Page 3: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

3

Contents

• Introduction• Background

– Programming models– Motivation

• IsoNet• Fault-tolerance• Evaluation• Conclusion

Page 4: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

4

Programming Models

• Parallel programming models– MPI– OpenMP

• Fine-grained parallelism– Emerging applications:

Recognition, Mining and Synthesis– Execution time of each computation kernel is very short

but it has abundant parallelism– Excessive overhead in multithreading

Page 5: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

5

Job Queuing

• Creates jobs instead of threads– One thread per core is

created– Thread: a set of instructions

and states of execution– Job: a set of data that is

processed by a thread• Job queue

– Manages the list of jobs– Maintains load balance CPU CPU

Thread Thread

JobJobJob

Page 6: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

6

Conflicts in Job Queue

• Chance of conflicts increases as:– The number of cores increases– The time taken to update the job queue increases– The job queue is accessed more frequently (job is short)

• Previous approaches– Distributed queues

• Load balance is maintained by job-stealing• The chance of conflicts in one local queue is decreased

– Hardware implementation• Time spent on updating the queue is reduced

Page 7: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

7

Profile of SMVM

Number of cores8 16 32 64

0

0.2

0.4

0.6

0.8

1.0

Rat

io o

f exe

cutio

n tim

e

4

Conflicts Stealing job Processing job

128 256

Page 8: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

8

Objectives

• Requirements of load balancer– Scalability: conflict-free– Fault-tolerance

• The probability of faults increases exponentially as technology scales

• Contributions of this paper– Light weight micro-network for load balancing– Scalable even with more than a thousand cores– Comprehensive fault-tolerance support

Page 9: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

9

Contents

• Introduction• Background• IsoNet

– Architecture– Implementation

• Fault-tolerance• Evaluation• Conclusion

Page 10: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

10

System View

R

CPU

R

CPU

R

CPU

R

CPU

R

CPU

R

CPU

I I I

I I I

Page 11: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

11

Microarchitecture of IsoNet Node

Com

p

MU

X

MU

X

Com

p

MU

X

DEM

UX

Dual ClockStack

JobCount

JobCount

Job Job

Max Selector

Min Selector

Switch

Page 12: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

12

How It Works

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

1 1

1

2

2 222

2

0

00

0

Tree-based routing: for fault-tolerance

Page 13: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

13

Single Cycle Implementation

• Estimated critical path delay– 11.38 ns (87.8 MHz)– By Elmore delay model

• Single cycle implementation offers low hardware cost

Leaf node

Int.node

Rootnode

Int.node

Src or

DestSwt Swt

Src node

Dest node

Page 14: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

14

Hardware Cost Estimation

Count Inst

Gate count

DCStack 204 1024

Selector

Leaf 0 641 Child 110 9282 Children 256 23 Children 480 294 Children 682 1

Switch 356 1024Root 59 1Total 674.50

674.50 * 240 * 4 = 647.52 K = 0.046% of 1.4 B (NVIDIA GTX285)

Page 15: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

15

Contents

• Introduction• Background• IsoNet• Fault-tolerance

– Transparent mode– Reconfiguration mode

• Evaluation• Conclusion

Page 16: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

16

Supporting Fault-Tolerance

• Transparent mode– For faulty CPUs– Bypass the corresponding IsoNet node

• Reconfiguration mode– For faulty IsoNet node– Operation

• When a fault is detected, all IsoNet nodes go into the reconfiguration mode

• Reconfigure the topology of IsoNet so that the faulty node is excluded

• Assign a new root node if the root node fails

Page 17: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

17

Reconfiguration

01

1

1

1

2

2

2

22

3

3 3

33333

3

3

33

3333

2 2

Root Node Candidate

Page 18: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

18

Contents

• Introduction• Background• IsoNet• Fault-tolerance• Evaluation

– Experimental setup– Results

• Conclusion

Page 19: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

19

Experimental Setup

• Simulation framework– Wind River’s Simics full-system simulator– CMP with 4~64 x86 compatible cores– Fedora 12 with kernel 2.6.33

• Benchmarks from recognition, mining and synthesis applications– GS: Gauss-Seidel– MMM: Dense Matrix-Matrix Multiply– SVA: Scaled Vector Addition– MVM: Dense Matrix Vector Multiply– SMVM: Sparse Matrix Vector Multiply

Page 20: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

20

Results

Number of cores4 8 16 32

MMM (6,473 instructions)

640

5

10

15

20

25

Exec

utio

n tim

e (1

07 cyc

les)

2

4

6

8

10

12

14

Spee

d up

Job stealing Carbon IsoNetCarbon speedup IsoNet speed up

Number of cores4 8 16 32

SMVM (2,872 instructions)

640

12

3

4

5

6

7

Exec

utio

n tim

e (1

07 cyc

les)

5101520253035

Spee

d up

404550

Page 21: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

21

Beyond Hundred Cores

• MMM (6,473 instructions)

Number of cores4 8 16 32 64

0

0.2

0.4

0.6

0.8

1.0

Rel

ativ

e Ex

ecut

ion

Tim

e

Carbon IsoNet

128 256 512 1024

Page 22: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

22

Profile of IsoNet

Number of cores8 16 32 64

0

0.2

0.4

0.6

0.8

1.0R

atio

of e

xecu

tion

time

4

Conflicts Stealing job Processing job

Page 23: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

23

Conclusion

• Scalability is one of key challenges in manycore domain• Scalability in load balancing is critical to utilize a number

of processing elements• This paper proposes a novel hardware-based dynamic

load distributor and balancer, called IsoNet• IsoNet also provides comprehensive fault-tolerance

support• Experimental results in a full-system simulation with real

applications demonstrate that IsoNet scales better than alternative techniques

Page 24: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

24

Questions?

Contact info

Junghee [email protected] and Computer EngineeringGeorgia Institute of Technology

Page 25: Hardware-based Job Queue Management for  Manycore  Architectures and  OpenMP  Environments

25

Thank you!