opportunities for hardware multithreading in microprocessors and microcontrollers

43
Theo Ungerer Systems and Networking University of Augsburg [email protected] http://www.informatik.uni-augsburg.de/sik/ Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

Upload: corin

Post on 16-Jan-2016

46 views

Category:

Documents


3 download

DESCRIPTION

Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers. Theo Ungerer Systems and Networking University of Augsburg [email protected] http://www.informatik.uni-augsburg.de/sik/. Basic Principle of Multithreading. thread 1:. Register set 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

Theo Ungerer

Systems and Networking

University of [email protected]

http://www.informatik.uni-augsburg.de/sik/

Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

Page 2: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

2

Basic Principle of Multithreading

Register set 1

Register set 2

Register set 3

Register set 4

PC PSR 1

PC PSR 2

PC PSR 3

PC PSR 4

Thread pointer

thread 1:

thread 2:

thread 3:

thread 4:

... ... ...

Page 3: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

3

Multithreadingin High Performance Processors

Multithreading in high-performance microprocessors IBM RS64 IV (SStar) Sun UltraSPARC V Intel Xeon TM

Hardware multithreading is the ability to pursue more than one thread within a processor pipeline.

Typically features: multiple register sets, fast context switching

Main objective: performance gain by latency hiding for multithreaded workloads

Page 4: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

4

Motivation State-of-the-art Multithreading

• Multithreading for throughput increase• Multithreading for power reduction• Multithreading for embedded real-time systems

Conclusions & Research Opportunities

Outline of the Presentation

Page 5: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

5

Todays Multiple-issue Processors

Utilization of instruction level parallelism

by a long instruction pipeline and

by the superscalar or the VLIW-/EPIC-technique.

Page 6: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

6

Problem: Low Resource Utilization by Sequential Programs

processor cycles

issue slots

vertical loss (= 4)

vertical loss (= 4)

horizontal loss = 2

horizontal loss = 1

horizontal loss = 3

Losses by empty issue slots

Page 7: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

7

Outline of the Presentation

Motivation State-of-the-art Multithreading

• Multithreading for throughput increase• Multithreading for power reduction• Multithreading for embedded real-time systems

Conclusions & Research Opportunities

Page 8: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

8

Multithreading

Two basic multithreading techniques• Interleaved Multithreading • Block Multithreading

Simultaneous multithreading (SMT)• combines wide issue superscalar with multithreading,• issues instructions from several threads simultaneously.

Page 9: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

9

Basic Multithreading Techniques

Single thread Interleaved MT Block MT

(a )

Tim

e (p

roce

ssor

cyc

les)

(c )

Con

text

sw

itch

(b )

Con

text

sw

itche

s

(1 )

(1 )

(1 )

(1 )

(1 )

(1 )

(1 )

(1 )

(1 )

(1 )

(2 )

(2 )

(2 )

(2 )

(2 ) (2 )

(3 )(4 )(3 )

(3 )

(4 )

(4 )

Page 10: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

10

SMT vs. CMP

SMT CMP

(a )

Tim

e (p

roce

ssor

cyc

les)

(b )

(1 ) (2 )

(4 ) (4 )

(4 )

(1 ) (2 ) (4 )(3 )

(1 )

(1 )

(1 )(1 )

(1 )

(4 ) (4 ) (4 )

(4 )

(2 ) (4 )

(4 ) (4 ) (1 )

(1 )

(1 )

(2 ) (2 ) (4 )

(2 ) (3 )

(1 ) (2 )

(4 )

(4 )

(2 )

(1 )

(2 )

(1 ) (1 ) (2 )

(1 )

(1 ) (1 )

(1 ) (1 ) (1 )

(2 ) (2 )

(3 )

(4 ) (4 ) (4 )

(2 ) (2 )

(2 )

(1 ) (2 ) (4 )(3 )

(1 )

(1 ) (2 )

(1 ) (1 )

(2 ) (2 )(1 )

(4 )

(4 )

(3 )

(3 )

(3 )(2 )

(4 )(3 )

(3 ) (4 ) (4 )

(4 )

(3 ) (4 )

(1 )(1 )

Page 11: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

11

Characteristics of Multithreading

Latency Utilization • The latencies that arise in the computation of a single

instruction stream are filled by computations of another thread.

Throughput of multithreaded workloads is increased Power Reduction

• Using less speculation Rapid Context Switching

• appropriate for real-time applications

Page 12: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

12

Outline of the Presentation

Motivation State-of-the-art Multithreading

Multithreading for throughput increase• Multithreading for power reduction• Multithreading for embedded real-time systems

Conclusions & Research Opportunities

Page 13: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

13

Multithreading for Throughput Increase

Lots of research results with simulated SMT since 1995

Some of our own research results• Performance estimation of SMT multimedia• Regard transistor count and chip-space estimation of the

models.

Page 14: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

14

Relevant Attributes for Rating Microprocessors

Performance Resource Requirement

Clock Speed Power Consumption

Two tools

• Performance estimation tool

• Transistor count and chip-space estimation tool

Page 15: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

15

Transistor Count and Chip-space Estimator

Vision:• The resources of the baseline model should be adjusted such that the same

chip space or the same transistor count is covered as in the new microachitecture models.

We use an analytical method for memory-based structures like register files or internal queues and

an empirical method for logic blocks like control logic and functional units.

half-feature size as measure of length of basic cell

Estimator tool is available (also for SimpleScalar) at:

http://www.informatik.uni-augsburg.de/lehrstuehle/info3/research/complexity/

Page 16: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

16

Execution-based Simulator:Baseline SMT Multimedia Processor Model

Branch

ComplInteger

RT WBRI

IDIF

GlobalL/S

LocalL/S

ThreadControl

SimpleInteger

LocalMemory

I/O

Memory-interface DCache

BTAC

ICache

Rename

Register

IDIF

To Memory

Page 17: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

17

Results of Performance and Hardware Cost Estimation

Demonstrated by two set of models:

„Maximum“ processor models with an abundance of resources

Small processor models

Workload is a MPEG-2 decoder made multithreaded

11

22

Page 18: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

18

Simulation Parameters

Fixed parameters:• 1024-entry BTAC, gshare branch predictor (2 K 2-bit counters, 8 bit history,

mispred. pen. 5 cycles)• 4-way set-associative D- and I-caches with 32 byte cache lines• 32 KB local on-chip RAM • 64-bit system bus, 4 MB main memory

Varied parameters:• 8-12 execution units• 256- and 32-entry reservation stations • 10 to 4 result buses• different D-cache sizes, D- and I-caches of 4 MB and 64 KB

Parameters Varied with Number of Threads: • 32 32-bit general-purpose registers and 40 rename registers (per thread),• 32- and 16-entry issue and retirement buffers (per thread)• Fetch and decode bandwidth is scaled with issue bandwidth and number of

threads: 1x1 – 8x8

Page 19: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

1912

46

8

1

2

4

68

6,39

5,57

3,91

1,99

1

6,38

5,57

3,91

1,99

1

5,58

5,23

3,89

1,99

1

3,283,26

3,07

1,96

1

1,681,68

1,65

1,43

0,930

1

2

3

4

5

6

7

IPC

Issue

Threads

Performance vs. Hardware Cost Estimation:Maximum Processor Models

4 MB I- and D-caches,6 integer/mm units2 local load/store units

11

Page 20: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

Transistor Count and Chip Space Estimation of Maximum Processor Models

86

42

1

8

4

297505

293124

288983

285178

283360

294584

291150

287794

284608

283038275000

280000

285000

290000

295000

300000

Threads

Issue

K Transistors

86

42

18

4

140366

123618

112461

105463

102947

123314

115208

108944

104283102357

80000

90000

100000

110000

120000

130000

140000

150000

Threads

Issue

Size in M ²

11

Page 21: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

21

Small Processor Models

12 4 6 8

12

46

8

3,63,6

3,09

1,9

0,98

3,513,51

3,08

1,91

0,97

3,23,2

2,91

1,89

0,99

2,172,17

2,1

1,72

0,98

1,231,23

1,231,17

0,88 0

1

2

3

4

IPC

Issue

Threads

64 KB I- and D-caches,3 integer/mm units1 local load/store unit32-enty reserv. stations16-entry issue andretirement buffers4 result buses2x4 fetch and decode bandwidth fixed

22

Page 22: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

Transistor Count and Chip Space Estimation of Small Processor Models

86

42

18

4

12534

11914

11287

10659

10342

12190

11619

11042

10465

10173

8000

9000

10000

11000

12000

13000

ThreadsIssue

K Transistors

86

42

1

84

8980

7996

7003

6010

5509

8188

7327

6459

5591

51544000

5000

6000

7000

8000

9000

ThreadsIssue

Space in M ²

22

Page 23: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

23

Results

4-threaded 8-issue SMT over a single-threaded 8-issue:

Commercial Multithreaded Processors:• Tera, MAJC, Alpha 21464, IBM Blue Gene, Sun UltraSPARC V• Network processors (Intel IXP, IBM PowerNP, Vitesse IQ2x00, Lextra,..)• IBM RS64 IV: two-threaded block MT, reported 5% overhead• Intel Xeon TM (hyperthreading): two-threaded SMT, reported 5% overhead

Speedup Transistor Chip Space Increase Increase

maximum model: 3 2% 9%

small model: 1.5 9% 27%

Page 24: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

24

Outline of the Presentation

Motivation State-of-the-art Multithreading

• Multithreading for throughput increase Multithreading for power reduction• Multithreading for embedded real-time systems

Conclusions & Research Opportunities

Page 25: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

25

SMT for Reduction of Power Consumption

Observation: Mispredictions cost energy

Todays superscalars: ~ 60% of the fetched and ~ 30% of the executed instructions are squashed

Idea: fill issue slots by less speculative instructions of other threads

Simulations of Seng et al. 2000 show that ~ 22% less energy is consumed by using a power-aware scheduler

Page 26: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

26

Outline of the Presentation

Motivation State-of-the-art Multithreading

• Multithreading for throughput increase• Multithreading for power reduction Multithreading for embedded real-time systems

Conclusions & Research Opportunities

Page 27: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

27

Multithreading in Embedded Real-time Systems– The Komodo Approach

Observation: multithreading allows a context switching overhead of zero cycles

Idea: harness multithreading for embedded real-time systems

Komodo Project: Real-time Java Based on a Multithreaded Java-microcontroller

http://www.informatik.uni-augsburg.de/lehrstuehle/info3/research/

komodo/indexEng.html

Page 28: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

28

Real-time Requirements

• run-time predictability• isolation of the threads• programmability• real-time scheduling support• fast context switching

Hard real-time: a deadline may never be missed

Soft real-time: a deadline may occasionally be missed

Page 29: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

29

Komodo Solutions

Extremely fast context switching by hardware multithreading Real-time scheduling in hardware Based on a Java processor core Predictability of all instruction executions by a careful

hardware design

Page 30: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

30

Komodo Microcontroller Pipeline

µ R O M P rio r itym a n a g e r

S ta c kreg iste r

s e t 1

S ta c kreg iste r

s e t 2

S ta c kreg iste r

s e t 3

S ta c kreg iste r

s e t 4

E xecu te

In s t ru c tio n fe tc h

P C 1 P C 2 P C 3 P C 4

IW 1 IW 2 IW 3 IW 4

In s t ru c tio n d e c o d e

P rio ritym anage r

µ R O M

O p e ra n d f e tc h

M e m o ry a c c e s s

O p e ra n d f e tc h

S ta c kreg iste r

s e t 2

S ta c kreg iste r

s e t 1

S ta c kreg iste r

s e t 3

S ta c kreg iste r

s e t 4

A d d re s s

In s tru c tio n s

Mem

ory

inte

rfac

e

A d d re s s

D a ta

Page 31: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

31

Komodo Microcontroller Design

c a p ture /c o m p a re

se ria linte rfa c e

I Cinte rfa c e

2

tim e r/c o unte r

p a ra lle lI/O

m ic ro -c o ntro lle r-

ke rne l

e xte rna lI/O -b us

m e m o ryb us

C AN-Businte rfa c e

se ria linte rfa c e

sig na l-unit

Page 32: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

32

Hardware Real-time Scheduling

Real-time scheduler is realized in hardware (by the priority manager)

Scheduling decision every clock cycle Four different scheduling algorithms implemented:

• Fixed Priority Preemptive (FPP)• Earliest Deadline First (EDF)• Least Laxity First (LLF)• Guaranteed Percentage (GP)

Page 33: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

33

Guaranteed Percentage Scheme

event A (20%)

tim e

event B (40%)

event C (30%)

start deadline

start

start

deadline

deadline

on a conventional processor

on a multithreaded processor

context switch

surplus

tionviola

Page 34: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

34

Simulation Results

0,00

0,50

1,00

1,50

2,00

2,50

3,00

FPP EDF GP LLF

gai

n

Baseline

Multithreading, without latency hiding

Multithreading, with latency hiding

thread mix (IC, PID, and FFT) applied

Page 35: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

35

Technical Data of the Komodo Prototype

Implementation of Komodo core pipeline on a Xilinx XCV800 with 800k gates

ASIC synthesis of whole microcontroller (0.18 m technology): 340 MHz, 3 mm2 chip

data bit widthaddress spacenumber of threadsinstruction window sizestack sizeexternal frequencyinternal frequencyCLBsnumber of gates

32 bit19 bit

48 bytes

128 entries33 MHz

8.25 MHz9 200

133 000

Page 36: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

36

Chip-Space of Komodo Core Pipeline

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

1 2 4 8 16threads

gat

e co

un

t

OF/MEM

BMIU

SMU

WBU

EXE

MRU

IWDU

IFU

Page 37: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

37

Reducing Power Consumption Using Real-time Scheduling in Hardware

Current work: Idea: Use information about the thread states and configurations available within the priority manager for a „fine-grained“ adaption of power consumption and performance.

Frequency and voltage adjustments in short time intervals done by hardware

Page 38: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

38

State of the Komodo Project

Software simulator FPGA prototyp Real-time Java system

M u ltith rea d edJav a co re

S ig n a lu n it

P rio ritym an a g er

H ard w a re

JV M

T rap ro u tin es

G a rb a g eco lle c tio n

S tan d a rd c la ssesA P I

O S A +M id d lew are

E x te n d ed A P I

C lass lo ad e r

IO -M o d u le sI /O -M o d u le sI/Ou n it D a ta tra n s fe r

b u ffe r

-ASIC-Middleware for distributed embedded systems

Page 39: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

39

Conclusions onMultithreading in Real-time Environments

Multithreaded processor cores: Performance gain due to fast context switching (for hard real-

time) and latency hiding (for soft and non real-time) More efficient event handling by ISTs Helper threads possible (garbage collection, debugging)

Real-time scheduling in hardware: Software overhead for real-time scheduling removed more efficient power saving mechanisms possible better predictablility by isolation of threads (GP scheduling)

Page 40: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

40

Conclusions & Research Opportunities

Multithreading proves advantageous:• Latency hiding: speed-ups of 2-3 for SMT,

lots of research done, next generation of microprocessors• Power reduction: 22% savings reported,

not much research up to now• Fast context switching utilized by microcontroller for real-time systems,

not much research up to now

Research opportunities:• Scheduling in SMT, network processors and multithreaded real-time systems• Thread-speculation: how to speed-up single-threaded programs?• Multithreading and power consumption• Multithreading in other communities: microcontrollers, SoCs• System software based on helper threads

Page 41: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

41

Acknowledgements

SMT Multimedia research group • Uli Sigmund and Heiko Oehring

Complexity estimation group• Marc Steinhaus, Reiner Kolla, Josep L. Larriba-Pey, Mateo

Valero

Komodo project group• Jochen Kreuzinger, Matthias Pfeffer, Sascha Uhrig,

Uwe Brinkschulte, Florentin Picioroaga, Etienne Schneider

Page 42: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

42

Mikroprozessors: Technology Prognosis up to 2012

SIA (semiconductor industries association) Prognose 1997:

Page 43: Opportunities for Hardware Multithreading in Microprocessors and Microcontrollers

43

Research Directions?

Increase performance of a single thread of control by• more instruction-level speculation

- Better branch prediction, - Trace cache and next trace prediction,- Data dependence and value prediction

Increase throughput of a workload of multiple threads• Utilize thread-level and instruction-level parallelism

- Chip-Multiprocessors- Multithreading (hardware thread = thread or process)

Thread speculation