the return of synthetic benchmarks

31
The Return of Synthetic Benchmarks Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin January 28, 2008

Upload: gari

Post on 08-Feb-2016

29 views

Category:

Documents


1 download

DESCRIPTION

The Return of Synthetic Benchmarks. January 28, 2008. Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Return of Synthetic Benchmarks

The Return of Synthetic Benchmarks

Ajay M. Joshi (UT Austin)Lieven Eeckhout (Ghent University)

Lizy K. John (UT Austin)

Laboratory of Computer ArchitectureDepartment of Electrical & Computer

EngineeringThe University of Texas at Austin

January 28, 2008

Page 2: The Return of Synthetic Benchmarks

2

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 3: The Return of Synthetic Benchmarks

3

Benchmark Spectrum

Toy Benchmarkse.g. Heap sort

Microbenchmarkse.g. STREAM

Kernel Codese.g. Livermore Loops

Application Suitese.g. SPEC CPU

Complete Application Code

Less Development Effort

More Scalable

More Maintainable

Less Representative

More Development Effort

Less Scalable

Less Maintainable

More Representative

Synthetic Benchmarkse.g. Dhrystone, Whetstone

Page 4: The Return of Synthetic Benchmarks

4

Benchmark Subsetting [Eeckhout et al., PACT’02]

[Vandierendonck et al., CAECW’04]

[Phansalkar et al., ISPASS’05]

[Eeckhout et al. IISWC’05]

• Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03]

• Representative Sampling [Sherwood et al., ASPLOS’02]

• Reduced Input Set [ KleinOsowski, CAN’04]

• Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05]

• Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04]

• Speedup Simulation [Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]

Ben

chm

ark

Exp

losi

onBenchmark Run Length

Microprocessor

Complexity

Focus on Simulation Time Reduction

Page 5: The Return of Synthetic Benchmarks

5

Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications

Single-Point Performance Characterization Application Benchmarks are Rigid

Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade

Studying Commercial Workload Performance Early Design Stage Power/Performance Studies

Motivation : Benchmarking Challenges

Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction

Page 6: The Return of Synthetic Benchmarks

6

Resurgence of Synthetic Benchmarks…..

IEEE Computer, August 2003

Page 7: The Return of Synthetic Benchmarks

7

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 8: The Return of Synthetic Benchmarks

8

Workload Synthesis: Central Idea

Workload Synthesizer

Inst

ruct

ion

Leve

l Pa

ralle

lism

Prog

ram

Loc

ality

Inst

ruct

ion

Mix

Cont

rol F

low

Beha

vior

ADD R1, R2,R3LD R4, R1, R6MUL R3, R6, R7 ADD R3, R2, R5DIV R10, R2, R1SUB R3, R5, R6

STORE R3, R10, R20ADD R1, R2,R3LD R4, R1, R6MUL R3, R6, R7 ADD R3, R2, R5DIV R10, R2, R1SUB R3, R5, R1

BEQ R3, R6, LOOPSUB R3, R5, R6

STORE R3, R10, R20DIV R10, R2, R1

………….

Application Behavior Space

‘Knobs’ for Changing Program

Characteristcs

Workload Synthesis Algorithm

Synthetic Benchmark

Execution Driven Simulator

Real Hardware or RTL

Compile and Execute

Just 40 workload characteristics

Page 9: The Return of Synthetic Benchmarks

9

Modeling Real-World Applications

Real Hardware

ExecutionDriven

Simulator

Real World Proprietary Workload

Synthetic Benchmark

Clone

Workload ProfilerBinary Instrumentation OR

Simulation

WorkloadSynthesizer

Workload Profile =

Workload Attributes

+DistributionOf Attribute

Values

Modeling Workload Attributes into Synthetic Workload

Experiment Environment

Microarchitecture-Independent Workload Profiling

Page 10: The Return of Synthetic Benchmarks

10

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 11: The Return of Synthetic Benchmarks

11

Workload Characteristics as ‘Knobs’Category Num. Characteristic

instruction mix 10 percentage of integer short latencypercentage of integer long latencypercentage of floating-point short latencypercentage of floating-point long latencypercentage of integer loadpercentage of integer storepercentage of floating-point loadpercentage of floating-point storepercentage of branches

Instruction-level parallelism

8 register-dependency-distance – 8 distributions for register dependencies. Register dependency distance equal to 1 instruction, and the percentage of dependency dependencies that have a distance of up to 2, 4, 6, 8, 16, 32, and greater than 32 instructions.

data locality 110

data footprintdistribution of local stride values

instruction locality 1 instruction footprint

branch predictability 10 distribution of branch transition rate

Page 12: The Return of Synthetic Benchmarks

12

Attributes to capture inherent workload behavior

– Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate

Modeling Locality & Control Flow Predictability

– Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches

Capturing The Essence of Workloads

Page 13: The Return of Synthetic Benchmarks

13

Modeling Data Access Pattern• Identify streams of data references

• A Stream? – Sequence of memory addresses in an arithmetic progression – Elements of arrays A, B, and C form 3 streams for( ii = 0; ii < N; ii ++)

A [ii] = B [ii] + C [ii]

200, 204, 208 .. 320, 324, 328 .. 404, 408, 412 ... Issuing Sequence : 320, 404, 200, 324, 408, 204 ….

• Streams are interleaved and may contain noise 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …

Page 14: The Return of Synthetic Benchmarks

14

Reference pattern of static Load / Store Instructions– PC-correlated spatial locality - Dependence on address referenced by nearby Ld / St

- Programs with pointer chasing codes

– PC-correlated temporal locality - Dependence on previous address generated by same Ld / St

- Programs with multidimensional arrays

Could static Load / Store instructions be natural sources of streams ?

Profile every static Load / Store instruction – Number of different strides with which it accesses data

Extracting Streams

Page 15: The Return of Synthetic Benchmarks

15

Dependency Distance

ADD R1, R3,R4

MUL R5,R3,R2 ADD R5,R3,R6

LD R4, (R1) SUB R8,R2,R1

Measure Distribution of Dependency Distances

Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32

Read After Write Dependency Distance = 3

Modeling Instruction Level Parallelism

Page 16: The Return of Synthetic Benchmarks

16

Capture behavior of easy and difficult to predict branches

Inherent program feature that captures branch behavior

Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed

Branches with low transition-rate (easier to predict)TTTTTTTTTN, NNNNNNNNNT

Branches with high transition-rate (easier to predict)TNTNTNTNTN

Branches with moderate transition-rate (tougher to predict)

Modeling Control Flow Predictability

Page 17: The Return of Synthetic Benchmarks

17

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 18: The Return of Synthetic Benchmarks

18

Workload Profile

Instruction MixRegister Dependency DistanceStride Pattern of Load/StoreBranch Transition RateBranch Transition Probabilities

C

A

B

D

BR

BRBR

BR

0.8 0.2

1.0 1.0

0.90.1

Synthetic Clone Generation

1 Big Loop

Workload Synthesis (1)

A

B

D

A

B

D

A

C

D

A

B

D

Page 19: The Return of Synthetic Benchmarks

19

Workload Profile

Instruction MixRegister Dependency DistanceStride Pattern of Load/StoreBranch Transition RateBranch Transition Probabilities

C

A

B

D

BR

BRBR

BR

0.8 0.2

1.0 1.0

0.90.1

Synthetic Clone Generation

1 Big Loop

Workload Synthesis (2)

A

B

D

A

B

D

A

C

D

A

B

D

Memory Access Model (Strides)

Page 20: The Return of Synthetic Benchmarks

20

Workload Profile

Instruction MixRegister Dependency DistanceStride Pattern of Load/StoreBranch Transition RateBranch Transition Probabilities

C

A

B

D

BR

BRBR

BR

0.8 0.2

1.0 1.0

0.90.1

Synthetic Clone Generation

1 Big Loop

Workload Synthesis (3)

A

B

D

A

B

D

A

C

D

A

B

D

Memory Access Model (Strides)

Branching Model – Based on Transition Rate

Page 21: The Return of Synthetic Benchmarks

21

Workload Profile

Instruction MixRegister Dependency DistanceStride Pattern of Load/StoreBranch Transition RateBranch Transition Probabilities

C

A

B

D

BR

BRBR

BR

0.8 0.2

1.0 1.0

0.90.1

Synthetic Clone Generation

1 Big Loop

Workload Synthesis (4)

A

B

D

A

B

D

A

C

D

A

B

D

Memory Access Model (Strides)

Branching Model – Based on Transition Rate

Register Assignment C code with asm & volatile constructs

Page 22: The Return of Synthetic Benchmarks

22

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 23: The Return of Synthetic Benchmarks

23

Evaluation of BenchMaker SPEC CPU2000, SPECjbb2005, and DBT2 workloads Validated Sim-Alpha Performance Model of Alpha 21264

Benchmark Input SimPoint(s)

SPEC CPU2000 Integer

bzip2 graphic 553

crafty ref 774

eon rushmeier 403

gcc 166.i 389

gzip graphic 389

mcf ref 553

perlbmk perfect-ref 5

twolf ref 1066

vortex lendian1 271

vpr route 476

gcc expr 8, 24, 47, 51, 56, 73, 87, 99

SPEC CPU95 Integer

gcc expr 0, 3,5,6,7,8,9,10,12

Page 24: The Return of Synthetic Benchmarks

24

Performance Correlation

00.20.40.60.8

11.21.41.61.8

bzip

2

craf

ty

gcc

gzip

mcf

perlb

mk

twol

f

vorte

x

vpr

dbt2

dbm

s

SP

EC

jbb2

005

Inst

ruct

ions

-Per

-Cyc

le

Original Benchmark Synthetic Benchmark

Trade Accuracy for Flexibility – Average Error of 11%

Page 25: The Return of Synthetic Benchmarks

25

Energy/Power Correlation

0

5

10

15

20

25

30

35bz

ip2

craf

ty

gcc

gzip

mcf

perlb

mk

twol

f

vorte

x

vpr

dbt2

dbm

s

SP

EC

jbb2

005

Ene

rgy-

Per

-Inst

ruct

ion

Original Benchmark Synthetic Benchmark

Average Error of 13%

Page 26: The Return of Synthetic Benchmarks

26

The Need for Synthetic Benchmarks BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

Outline

Page 27: The Return of Synthetic Benchmarks

27

Altering Individual Program Characteristics

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 66 70 80 90 100

Percentage of References with Stride Value 0

Inst

ruct

ions

-Per

-Cyc

le

Page 28: The Return of Synthetic Benchmarks

28

Interaction of Program Characteristics

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 10 20 30 40 50 60 66 70 80 90 100Percentage of references with Stride Value 0

L1 D

-cac

he M

iss-

Rat

eData Footprint - 600K Data Footprint - 300KData Footprint - 900K

Page 29: The Return of Synthetic Benchmarks

29

Modeling Impact of Benchmark Drift

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8

Factor by which code size is increased

Inst

ruct

ions

-Per

-Cyc

le

Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)

Increase in Code Footprint (hypothetical)

Page 30: The Return of Synthetic Benchmarks

30

Summary Synthetic Benchmarks to Address Benchmarking Challenges

Constructing Synthetic Benchmarks from Hardware-Independent Characteristics

Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift

Page 31: The Return of Synthetic Benchmarks

31

Questions?

Ajay’s email: [email protected]