estimation of design parameters - uni-paderborn.de

44
Estimation of Design Parameters Prof. Dr. Christian Plessl SS 2017 High-Performance IT Systems group Paderborn University Version 1.6.1 – 2017-07-24

Upload: others

Post on 05-May-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimation of Design Parameters - uni-paderborn.de

Estimation of Design Parameters

Prof. Dr. Christian Plessl

SS 2017

High-Performance IT Systems groupPaderborn University

Version 1.6.1 – 2017-07-24

Page 2: Estimation of Design Parameters - uni-paderborn.de

2

Motivation• estimation is to determine design parameters without

implementing the system– supports design decisions– enables design space exploration– forms the basis for system optimizations

analysis

simulation

emulation / rapid prototyping

implementation

estimation

measurement

0

4

3

1 2

Page 3: Estimation of Design Parameters - uni-paderborn.de

3

Overview

• parameters of estimation methods

• estimation of hardware metrics

• estimation of software metrics

Page 4: Estimation of Design Parameters - uni-paderborn.de

4

Parameters of Estimation Methods

model

coarse

fine

accuracy fidelity effort (time)

high

low

high

low

high

low

Page 5: Estimation of Design Parameters - uni-paderborn.de

5

Accuracy

• definition: Let E(D) be an estimated and M(D) an exact (measured) metric of an implementation D. The accuracy Aof the estimation is given by:

𝐴 = 1 −𝐸 𝐷 −𝑀(𝐷)

𝑀(𝐷)

Page 6: Estimation of Design Parameters - uni-paderborn.de

6

Fidelity

• definition: Let D = {D1, D2, …, Dn} be a set of implemen-tations. The fidelity F of an estimation method is given by:

𝐹 = 100 ⋅2

𝑛 𝑛 − 1 ⋅/ / 𝜇1,3

4

35167

4

157

𝜇1,3=1 𝑖𝑓

((𝐸 𝐷1 > 𝐸 𝐷3 ∧ 𝑀 𝐷1 > 𝑀 𝐷3 ) ∨

𝐸 𝐷1 < 𝐸 𝐷3 ∧ 𝑀 𝐷1 < 𝑀 𝐷3 ∨

(𝐸 𝐷1 = 𝐸 𝐷3 ∧ 𝑀 𝐷1 = 𝑀 𝐷3 )

0 𝑒𝑙𝑠𝑒

Page 7: Estimation of Design Parameters - uni-paderborn.de

7

Fidelity – Example

A B C

metric

implementation A B C

metric

estimatedmeasured

fidelity = 100 % fidelity = 33.3 %

✎ Exercise: Parameters of estimation methods

Page 8: Estimation of Design Parameters - uni-paderborn.de

8

Metrics• prime metrics

– performance§ hardware: clock period, latency, execution time, throughput§ software: execution time, worst-case execution time, throughput§ communication: bit rate, communication time, throughput

– power consumption, energy requirements– cost

• secondary metrics (softer and more difficult to quantify)– reliability – testability– time-to-market– ...

Page 9: Estimation of Design Parameters - uni-paderborn.de

9

Overview

• parameters of estimation methods

• estimation of hardware metrics

• estimation of software metrics

Page 10: Estimation of Design Parameters - uni-paderborn.de

10

Hardware - Performance

finite statemachine

controller

registers

ALU ALU

datapath

• clock period T– depends on technology,

resources

• latency L– given by the number

of clock steps

• execution timeTex = T * L

• throughputR = 1 / Tex

Page 11: Estimation of Design Parameters - uni-paderborn.de

11

Example (1)

+

+

+

+

*

*

150

150

80

80 80

80

clock period T = 380 ns

execution time Tex = 380 ns

resources: 2 MUL, 4 ADD

latency L = 1

performance estimation: e.g. with ASAP or list scheduling

Page 12: Estimation of Design Parameters - uni-paderborn.de

12

Example (2)

+

+

+

+

*

*

150

150

80

80

80

80

clock period T = 150 ns

execution time Tex = 600 ns

resources: 1 MUL, 1 ADD

multi-cycle implementationnot pipelined!resources are sharedbetween stages

latency L = 4

performance estimation: e.g. with method of maximum operator delay

Page 13: Estimation of Design Parameters - uni-paderborn.de

13

Pipelining

T Tex R = 5/Tex

pipelining with P

stages of equal length:

R = P / Tex

+

+

+

+

*

*

150

150

80

80

80

80

resources: 2 MUL, 4 ADD

performance estimation: e.g. with method of clock slack minimization

Page 14: Estimation of Design Parameters - uni-paderborn.de

14

Estimation of the Clock Period

• functional units (operators) vk with delays del(vk)

– method of the maximum operator delay

– clock slack minimization method§ search in the interval [Tmin … Tmax ] for the clock period

T that maximizes the utilization (minimizes clock slack)

– ILP search

𝑇 = maxE(𝑑𝑒𝑙 𝑣E )

Page 15: Estimation of Design Parameters - uni-paderborn.de

15

Clock Slack

busy unitslack

time (ns)50 100 150

T=163

ADD

SUB

MUL

𝑠𝑙𝑎𝑐𝑘 𝑇, 𝑣E =( 𝑑𝑒𝑙(𝑣E)/𝑇 ⋅ 𝑇 − 𝑑𝑒𝑙(𝑣E)

Page 16: Estimation of Design Parameters - uni-paderborn.de

16

Clock Slack Minimization

• let occ(vk) be the number of operations of type k, and |VT|the number of different operation types. Then, the average clock slack is given by:

• the utilization is given by:

✎ Exercise: Clock slack minimization

𝑎𝑣𝑔𝑠𝑙𝑎𝑐𝑘 𝑇 =∑ (𝑜𝑐𝑐 𝑣E ⋅ 𝑠𝑙𝑎𝑐𝑘 𝑇, 𝑣E )OPE57

∑ 𝑜𝑐𝑐(𝑣E)OPE57

𝑢𝑡𝑖𝑙 𝑇 = 1 −𝑎𝑣𝑔𝑠𝑙𝑎𝑐𝑘(𝑇)

𝑇

Page 17: Estimation of Design Parameters - uni-paderborn.de

17

Estimating Maximum Frequency for FSMD

• model of a finite state machine with datapath

memory

DR AR

R1 R2 R3state register

next state logic

control logic

controllerdelay

datapathdelay

memorydelay

Page 18: Estimation of Design Parameters - uni-paderborn.de

Estimating Hardware Properties from FSMD Model

• hardware effort– data path: register, functional units, logic, wiring– controller: state register, control logic, next state logic– cost expressed in silicon area

§ mm2, λ2

§ ASIC: number of transistors, number of gates§ FPGA: number of logic blocks

• speed– maximal operating speed is limited by critical path (longest combinational delay)– compare critical path of

§ FSM controller (next state logic)§ data-path§ memory subsystem

• production cost– package, number of I/O pins– semiconductor technology (lithography, materials, processing, testing, production

volume)

18

Page 19: Estimation of Design Parameters - uni-paderborn.de

19

Estimating Power Consumption of CMOS Hardware

X X X

Vdd

Gnd

Cload

X

(per transition)

Cload …. load capacitanceVdd …… supply voltageVt …….. threshold voltagek ……… constant

a ……… activity factorf ………. clock frequency t ……… delay (t = 1/f)τ = k ⋅Cload ⋅

VddVdd −Vt( )

2

P = Pstatic +Pdynamic

Pdynamic = Pshort +Pload

Eavg =12⋅Cload ⋅Vdd

2

Pavg =12⋅Cload ⋅α ⋅ f ⋅Vdd

2

Page 20: Estimation of Design Parameters - uni-paderborn.de

Hardware - Power / Energy (1)

• power consumption p(t) [W]

– average power consumption P [W]

– energy E [Ws]

P = 1T

p(t)dt0

T

E = p(t)dt0

T

p(t)

t

P

E

0 T

E = P ⋅T

20

Page 21: Estimation of Design Parameters - uni-paderborn.de

21

Hardware - Power / Energy (2)• power dissipation

– P [W]– important for dimensioning of packaging, power supply, cooling

• energy (power-delay product) – E = Pavg * Texe [Ws] – important for mobile devices (battery life time)– metrics for systems that are operated at a fixed rate

– power normalized to clock period: [µW/MHz] – for processors also:

§ [µW/MIPS] or [MIPS/µW]§ [µW/SPEC] or [SPEC/µW]

Page 22: Estimation of Design Parameters - uni-paderborn.de

22

Hardware - Power / Energy (3)• energy-delay product

– EDP = E * Texe [Ws2] – metrics for systems that are operated at maximum rate

– for processors also: § [MIPS2/µW]§ [SPEC2/µW]

15 30

P[mW]

T [µs]

2

40.9 pWs2

1.8 pWs2

Page 23: Estimation of Design Parameters - uni-paderborn.de

23

“Mobile” Processors Processor (Vendor)

Techn. [µ] VDD [V] Clock [MHz] Power

[mW] MIPS MIPS/W MIPS2/mW

StrongARM (Intel) 0.35 2 230 360 268 744 200

ARM710 (VLSI) 0.8 3.3 25 120 30 250 8

ARM940T (VLSI) 0.35 3.3 150 675 (e)160 (e)237 (e)38

MMC2001 (Motorola) 0.35 2 34 80 31 387 12

TR4102 (LSI) 0.25 1.8 80 40 (e)90 (e)2250 (e)203

SH7708 (Hitachi) 0.5 3.3 25 95 25 263 7

SH7750 (Hitachi) 0.25 1.8 200 1600 300 188 56

MIPS ratings for dhrystone

benchmark(e) … estimated from: Jean-Michel Puiatti,

Instruction-Level Parallelism for Embedded Processors,PhD Thesis, EPFL, 1999.

Page 24: Estimation of Design Parameters - uni-paderborn.de

24

VLIW Processors

Processor(Vendor)

Techn.[µ] VDD [V] Clock [MHz] Power

[mW] MIPS MIPS/W MIPS2/mW

‘C6201 (Texas Instr.) 0.25 2.5 200 4600 1600 348 557

SC140(Mot. /Lucent) 0.13 1.5 300 500 3000 6000 18000

TM1000 (Philips) 0.35 3.3 100 4000 2500 625 1563

Merced (HP/Intel) 0.18 ? 800 (e)70000 6400 (e)91 (e)585

MIPS ratings are peak numbers(e) … estimated from: Jean-Michel Puiatti,Instruction-Level Parallelism for Embedded Processors,PhD Thesis, EPFL, 1999.

Page 25: Estimation of Design Parameters - uni-paderborn.de

25

Overview

• parameters of estimation methods

• estimation of hardware metrics

• estimation of software metrics

Page 26: Estimation of Design Parameters - uni-paderborn.de

26

• execution time T

T = Ic x CPI x t = (Ic x CPI) / f

§ Ic … instruction count for a given program

§ CPI … cycles per instruction (averaged value)

§ t … clock period, f … clock frequency

• example Ic = 2000, CPI = 0.4, f = 400 MHz àT = 2 µs

Software - Performance

Page 27: Estimation of Design Parameters - uni-paderborn.de

27

Example (1)

• MIPS rate (million instructions per second)

MIPS = Ic / (T x 106) = f / (CPI x 106)

example:• processor with clock frequency of 500 MHz• 3 instruction classes A, B, C with CPIA = 1, CPIB = 2, CPIC = 3• 2 different compilers generate (for the same program) following

instruction mixes (x 109):

A B CCompiler 1 5 1 1Compiler 2 10 1 1

Page 28: Estimation of Design Parameters - uni-paderborn.de

28

Example (2)

• clock cycles program 1: (5 x 1 + 1 x 2 + 1 x 3) x 109 = 10 x 109

program 2: (10 x 1 + 1 x 2 + 1 x 3) x 109 = 15 x 109

• execution timesprogram 1: (10 x 109) / (500 x 106) = 20 secprogram 2: (15 x 109) / (500 x 106) = 30 sec

• MIPS ratesprogram 1: ((5 + 1 + 1) x 109) / (20 x 106) = 350 MIPSprogram 2: ((10 + 1 + 1) x 109) / (30 x 106) = 400 MIPS

program 2 has a higher MIPS rate, program 1 runs faster !

Page 29: Estimation of Design Parameters - uni-paderborn.de

29

Software - Performance Metrics (1)

• MIPS (million instructions per second)

• MFLOPS (million floating-point operations per second)

• MACS (million multiply & accumulates per second)

– important for DSPs

• MOPS (million operations per second)

– counts all possible operations: ALUs, address calculations, DMA, …

for all : – parallel operations considered

– sustained (!) peak performance

Page 30: Estimation of Design Parameters - uni-paderborn.de

30

Software - Performance Metrics (2)• execution time

– profiling: compilation and many test runs → statistical statements– estimation on the basis of source- / intermediate- / target code

• worst-case execution time (WCET)– important for real-time systems with hard timing constraints– estimation requires

program path analysis• which sequence of instructions is executed in the worst-case (longest runtime)?

microarchitectural modeling• taking processor specifics into account: instruction timing, pipelining, caches

Page 31: Estimation of Design Parameters - uni-paderborn.de

31

WCET• measurement works only if

– the worst-case input can be determined, or– exhaustive measurement is possible

• the estimated WCET– is always higher than the actual WCET; a good estimation method

closely approximates the actual WCET

execution time

dist

ribut

ion

of e

xecu

tion

times

Page 32: Estimation of Design Parameters - uni-paderborn.de

32

Modern Processor Features• modern processors increase performance by using

– caches– pipelining– branch prediction– ...

• these features make WCET computation difficult because the execution times of single instructions vary widely

– best case: no cache misses, operands ready, needed resources free, branches correctly predicted

– worst case: all memory accesses lead to cache misses, resources are blocked, operands not ready, wrong predictions

– difference between worst and best case can be up to several hundreds of clock cycles

Page 33: Estimation of Design Parameters - uni-paderborn.de

33

WCET Computation - Approach

microarchitectural analysisworst-case program path analysis

commercialized as: aiT Tool http://www.absint.com/ait/

Page 34: Estimation of Design Parameters - uni-paderborn.de

34

Program Path Analysis• which sequence of instructions is executed in the worst-case

(gives the longest runtime)?

• problem: the number of possible program paths grows exponentially with the program length

• model – fixed number of cycles for each basic block (derived by static analysis)– loops must be bounded

• approach– transform structure of control flow graph (CFG) into an integer linear

program (ILP)– provide as many additional constraints as possible– solution to ILP gives bound on the WCET

Page 35: Estimation of Design Parameters - uni-paderborn.de

35

Example

/* k >= 0 */s = k;WHILE (k < 10) {

IF (ok)j++;

ELSE {j = 0;ok = true;

}k ++;

}r = j;

s = k;

WHILE (k<10)

if (ok)

j++; j = 0;ok = true;

k++;

r = j;

B1

B2

B3

B4 B5

B6

B7

Page 36: Estimation of Design Parameters - uni-paderborn.de

36

Calculation of the WCET• definition: A program consists of N basic blocks, where

each basic block Bi has a worst-case execution time ci and is executed for exactly xi times. Then, the WCET is given by:

– the ci values can be determined (by static analysis), because the sequence of executed instructions is known (definition of basic block)

– how to determine xi ?§ structural constraints given by the program structure§ additional constraints provided by the programmer (eg. bounds for loop

counters) based on the knowledge of the program context

𝑊𝐶𝐸𝑇 =/𝑐1 ⋅ 𝑥1

V

157

Page 37: Estimation of Design Parameters - uni-paderborn.de

37

Structural Constraintss = k;

WHILE (k<10)

if (ok)

j++; j = 0;ok = true;

k++;

r = j;

B1

B2

B3

B4 B5

B6

B7

d1

d2

d3

d8

d9

d4 d5

d6 d7

d10

flow equations:

d1 = d2 = x1

d2 + d8 = d3 + d9 = x2

d3 = d4 + d5 = x3

d4 = d6 = x4

d5 = d7 = x5

d6 + d7 = d8 = x6

d9 = d10 = x7

Page 38: Estimation of Design Parameters - uni-paderborn.de

38

Additional Constraints

loop is executed for at most 10 times:

x3 <= 10 · x1

B5 is executed for at most one time:

x5 <= 1 · x1

s = k;

WHILE (k<10)

if (ok)

j++; j = 0;ok = true;

k++;

r = j;

B1

B2

B3

B4 B5

B6

B7

d1

d2

d3

d8

d9

d4 d5

d6 d7

d10

Page 39: Estimation of Design Parameters - uni-paderborn.de

39

WCET - ILP• ILP with structural and additional constraints

structuralconstraints

program is executed once

✎ Exercise: WCET estimation

𝑊𝐶𝐸𝑇 = max{/𝑐1 ⋅ 𝑥1|𝑑7 = 1 ∧V

157

/ 𝑑3 = / 𝑑E = 𝑥1, 𝑖 = 1…𝑁 ∧�

E∈^_`(ab)

3∈14(ab)𝑎𝑑𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 }

Page 40: Estimation of Design Parameters - uni-paderborn.de

40

Software - Cost• processor• size of the program memory

– instr_size(j) is the memory requirement for the generic instruction j

• size of the data memory– a program contains a set D of declarations

– data_size(d) is the memory requirement for the declaration d

𝑝𝑟𝑜𝑔𝑠𝑖𝑧𝑒a1= ∑ 𝑖𝑛𝑠𝑡_𝑠𝑖𝑧𝑒(𝑗)�3∈ab

𝑑𝑎𝑡𝑎_𝑠𝑖𝑧𝑒= ∑ 𝑑𝑎𝑡𝑎_𝑠𝑖𝑧𝑒(𝑑)�g∈h

Page 41: Estimation of Design Parameters - uni-paderborn.de

41

Software – Power / Energy (1)• simulation on the instruction level

– assumptions:§ each instruction needs a certain energy§ each pair of instructions needs a certain energy

– simple model, requires only an instruction set simulator– for classical DSP architectures accuracy > 90% achieved– extensions: different energy values for loads/stores

§ depending on the sources / destinations (memory hierarchy) § depending on internal / external memory accesses

– measuring the energy values:

processor

A

+-

while (1) {test_code()

}

Page 42: Estimation of Design Parameters - uni-paderborn.de

42

Software - Power / Energy (2)

• simulation on the architecture level– capacity models for all processor building blocks: ALUs, registers,

controllers, cache, …– activities of the blocks are simulated

– complex model, requires cycle-accurate processor simulator– higher accuracy than simulation on the instruction level

Page 43: Estimation of Design Parameters - uni-paderborn.de

Changes

• 2017-07-24 (v1.6.1)– add information about suitable estimation methods to slides 10-13

• 2017-07-14 (v1.6.0)– updated for SS 2017

• 2016-07-10 (v1.5.0)– updated for SS2016– fixed numerous equations that were broken by PowerPoint update

• 2015-07-02 (v1.4.0)– updated for SS2015

43

Page 44: Estimation of Design Parameters - uni-paderborn.de

Changes

• 2014-07-17 (v1.3.1)– cosmetics

• 2014-07-07 (v1.3.0)– improve explanations on slide 12, 17

• 2013-07-18 (v1.2.1)– removed one incorrect example for performance estimation of HW (former

slide 13)• 2013-06-23 (v1.2.0)

– updated for SS2013• 2012-06-15 (v1.1.0)

– updated for SS2012• 2011-07-07 (v1.0.1)

– cosmetic changes

• 2011-06-30 (v1.0.0)– initial version

44