accurate & efficient regression modeling for...

48
Motivation & Background Model Derivation Model Evaluation Conclusion Accurate & Efficient Regression Modeling for Microarchitectural Performance & Power Prediction Benjamin C. Lee, David M. Brooks {bclee,dbrooks}@eecs.harvard.edu Division of Engineering and Applied Sciences Harvard University 24 October 2006 Benjamin C. Lee, David M. Brooks 1 :: ASPLOS-XII

Upload: others

Post on 11-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Accurate & Efficient Regression Modeling forMicroarchitectural Performance & Power

Prediction

Benjamin C. Lee, David M. Brooks

{bclee,dbrooks}@eecs.harvard.eduDivision of Engineering and Applied Sciences

Harvard University

24 October 2006

Benjamin C. Lee, David M. Brooks 1 :: ASPLOS-XII

Page 2: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmRegression Theory

Model DerivationExperimental MethodologyDerivation OverviewModel Specification

Model EvaluationPerformancePower

Conclusion

Benjamin C. Lee, David M. Brooks 2 :: ASPLOS-XII

Page 3: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmRegression Theory

Model DerivationExperimental MethodologyDerivation OverviewModel Specification

Model EvaluationPerformancePower

Conclusion

Benjamin C. Lee, David M. Brooks 3 :: ASPLOS-XII

Page 4: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Microarchitectural Design Space

Increasing diversity of interesting, viable designsExamples :: Power 4, Pentium 4, UltraSPARC T1Tractably quantify trends across comprehensive design space

Benjamin C. Lee, David M. Brooks 4 :: ASPLOS-XII

Page 5: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Microarchitectural Simulation Challenges

Cycle-Accurate SimulationAccurately identifies trends in design spaceTracks instructions’ progress through microprocessorEstimates performance, power, temperature, . . .

Simulation CostsLong simulation times (minutes,hours per design)Number of potential simulations scale exponentially (mp)

p :: parameter countm :: parameter resolution

Benjamin C. Lee, David M. Brooks 5 :: ASPLOS-XII

Page 6: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Microarchitectural Sampling

Temporal SamplingSample from instruction traces in time domainReduce simulation costs via size of inputsSynthetic traces from profiled workloads 1

Sampled traces from phase analysis 2

Spatial SamplingSample from design spaceReduce simulation costs via number of simulations

1Eeckhout [ISPASS’00]2Sherwood [ASPLOS’02], Wunderlich [ISCA’03]

Benjamin C. Lee, David M. Brooks 6 :: ASPLOS-XII

Page 7: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Simulation Paradigm

Comprehensively understand design spaceSpecify large, high-resolution design spaceConsider all design parameter simultaneously

Selectively simulate modest number of designsSample points randomly from design space for simulationDecouple resolution of design space and simulation

Efficiently leverage simulation data with inferenceReveal trends, trade-offs from sparse samplingEnable predictions for metrics of interest

Benjamin C. Lee, David M. Brooks 7 :: ASPLOS-XII

Page 8: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Regression Theory

Statistical InferenceModels approximate solutions to intractable problemsRequires initial data to train, formulate modelLeverages correlations from initial data for prediction

Regression ModelsLow formulation costs (1K samples from 1B designs)Accurate inference (4− 7% median error)Efficient computation (100’s of predictions per second)

Benjamin C. Lee, David M. Brooks 8 :: ASPLOS-XII

Page 9: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Model Formulation

Notationn observations . {simulated design samples}Response :: ~y = y1, . . . , yn . {e.g., performance, power }Predictor :: ~xi = xi,1, . . . , xi,p . {e.g., depth, cache}

Regression Coefficients :: ~β = β0, . . . , βp

Random Error :: ~e = e1, . . . , en where ei ∼ N(0, σ2)Transformations :: f ,~g = g1, . . . , gp

Model

f (y) = β0 +p∑

j=1

βjgj(xj) + e

Benjamin C. Lee, David M. Brooks 9 :: ASPLOS-XII

Page 10: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Predictor Interaction

Modeling InteractionSuppose effects of predictors x1, x2 cannot be separatedConstruct predictor x3 = x1x2

y = β0 + β1x1 + β2x2 + β3x1x2 + ei

ExampleLet x1 be pipeline depth, x2 be L2 cache sizePerformance impact of pipelining affected by cache size

Speedup =Depth

1 + Stalls/Inst

Benjamin C. Lee, David M. Brooks 10 :: ASPLOS-XII

Page 11: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Predictor Non-Linearity I

Restricted Cubic SplinesDivide predictor domain into intervals separated by knotsPiecewise cubic polynomials joined at knotsHigher order polynomials provide better fits 3

3Stone [SS’86]Benjamin C. Lee, David M. Brooks 11 :: ASPLOS-XII

Page 12: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Predictor Non-Linearity II

Location of KnotsLocation of knots less important than number of knots 4

Place knots at fixed predictor quantiles

Number of KnotsFlexibility, risk of over-fitting increases with knot count5 knots or fewer are often sufficient4 knots balances flexibility, risk of over-fitting

4Harrell [Springer’01]Benjamin C. Lee, David M. Brooks 12 :: ASPLOS-XII

Page 13: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmRegression Theory

Prediction

Expected Responseβ are known from least squaresxi,1, . . . , xi,p are known for a given query iExpected response is weighted sum of predictor values

E[y]

= E[β0 +

p∑j=1

βjxj

]+ E

[e]

= β0 +p∑

j=1

βjxj

Benjamin C. Lee, David M. Brooks 13 :: ASPLOS-XII

Page 14: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmRegression Theory

Model DerivationExperimental MethodologyDerivation OverviewModel Specification

Model EvaluationPerformancePower

Conclusion

Benjamin C. Lee, David M. Brooks 14 :: ASPLOS-XII

Page 15: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Tools and Benchmarks

Simulation FrameworkTurandot :: a cycle-accurate trace driven simulatorPowerTimer :: power models derived from circuit analysesBaseline simulator models POWER4/POWER5 architecture

BenchmarksSPEC2kCPU :: compute-intensive benchmarksSPECjbb :: Java server benchmark

Statistical FrameworkR :: software environment for statistical computingHmisc and Design packages5

5Harrell [Springer,’01]Benjamin C. Lee, David M. Brooks 15 :: ASPLOS-XII

Page 16: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Spatial Sampling

Design SpaceSi :: set of values for parameter xi, i∈[1, p]S =

∏pi=1 Si :: design space

B :: set of benchmarks|S| ≈ 109 and |B| = 22

Sampling Uniformly at Random (UAR)Sample n = 4, 000 designs and benchmarks for simulationDecouple resolution of design space and simulationUnbiased observations from full range of parameter values

Benjamin C. Lee, David M. Brooks 16 :: ASPLOS-XII

Page 17: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Predictors :: MicroarchitectureSet Parameters Measure Range |Si|

S1 Depth depth FO4 9::3::36 10S2 Width width insn b/w 4,8,16 3

L/S reorder queue entries 15::15::45store queue entries 14::14::42functional units count 1,2,4

S3 Physical general purpose (GP) count 40::10::130 10Registers floating-point (FP) count 40::8::112

special purpose (SP) count 42::6::96S4 Reservation branch entries 6::1::15 10

Stations fixed-point/memory entries 10::2::28floating-point entries 5::1::14

S5 I-L1 Cache i-L1 cache size log2(entries) 7::1::11 5S6 D-L1 Cache d-L1 cache size log2(entries) 6::1::10 5S7 L2 Cache L2 cache size log2(entries) 11::1::15 5

L2 cache latency cycles 6::2::14S8 Control Latency branch latency cycles 1,2 2S9 FX Latency ALU latency cycles 1::1::5 5

FX-multiply latency cycles 4::1::8FX-divide latency cycles 35::5::55

S10 FP Latency FPU latency cycles 5::1::9 5FP-divide latency cycles 25::5::45

S11 L/S Latency Load/Store latency cycles 3::1::7 5S12 Memory Latency Main memory latency cycles 70::5::115 10

Benjamin C. Lee, David M. Brooks 17 :: ASPLOS-XII

Page 18: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Predictors :: Application-Specific

Application CharacteristicsCollect program characteristics on baseline architectureInstruction throughputCache access patternsBranch patternsSources of pipeline stalls

Application EffectsSignificant interactions with microarchitectural predictorsExample :: Impact of d-L1 cache affected by access rates

Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII

Page 19: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Derivation Overview

Hierarchical ClusteringPerformance Associations and Correlations

qualitative scatterplots, quantitative ρ2

Model Specificationpredictor interaction, non-linearity

Assessing FitR2 statistic

Residual Analysisnormality (quantile-quantile), randomness (scatterplots)

Significance Testinghypothesis testing, F-statistic, p-values

Benjamin C. Lee, David M. Brooks 19 :: ASPLOS-XII

Page 20: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Derivation Overview

Hierarchical ClusteringPerformance Associations and Correlations

qualitative scatterplots, quantitative ρ2

Model Specificationpredictor interaction, non-linearity

Assessing FitR2 statistic

Residual Analysisnormality (quantile-quantile), randomness (scatterplots)

Significance Testinghypothesis testing, F-statistic, p-values

Benjamin C. Lee, David M. Brooks 20 :: ASPLOS-XII

Page 21: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Performance Correlations

Benjamin C. Lee, David M. Brooks 21 :: ASPLOS-XII

Page 22: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyDerivation OverviewModel Specification

Model Specification

InteractionsPipeline width/depth interact with

instruction bandwidth structures (queues, register file)cache hierarchy

Cache hierarchy sizes interact withadjacent levels in hierarchyapplication-specific access rates

Baseline performance interacts with resource sizings

Restricted Cubic SplinesWeaker relationships (latencies, caches, queues) :: 3 knotsStronger relationships (depth, registers) :: 4 knotsBaseline performance :: 5 knots

Benjamin C. Lee, David M. Brooks 22 :: ASPLOS-XII

Page 23: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmRegression Theory

Model DerivationExperimental MethodologyDerivation OverviewModel Specification

Model EvaluationPerformancePower

Conclusion

Benjamin C. Lee, David M. Brooks 23 :: ASPLOS-XII

Page 24: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Validation Approach

FrameworkFormulate models with n < 4, 000 samplesObtain 100 additional random samples for validationQuantify percentage error, 100 ∗ |yi − yi|/yi

Model VariantsBaseline (B) :: Non-transformed responseStabilized (S) :: Square-root of responseRegional (S+R) :: Per query with similar samplesApplication (S+A) :: Per benchmark with similar samples

Benjamin C. Lee, David M. Brooks 24 :: ASPLOS-XII

Page 25: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Regional Sampling

Benjamin C. Lee, David M. Brooks 25 :: ASPLOS-XII

Page 26: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Performance Prediction

Benjamin C. Lee, David M. Brooks 26 :: ASPLOS-XII

Page 27: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Performance Sensitivity :: S+A

Benjamin C. Lee, David M. Brooks 27 :: ASPLOS-XII

Page 28: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Power Prediction

Benjamin C. Lee, David M. Brooks 28 :: ASPLOS-XII

Page 29: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

PerformancePower

Power Sensitivity :: S+R Region

Benjamin C. Lee, David M. Brooks 29 :: ASPLOS-XII

Page 30: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmRegression Theory

Model DerivationExperimental MethodologyDerivation OverviewModel Specification

Model EvaluationPerformancePower

Conclusion

Benjamin C. Lee, David M. Brooks 30 :: ASPLOS-XII

Page 31: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Conclusion

Simulation ParadigmComprehensively understand design spaceSelectively simulate modest number of designsEfficiently leverage simulation data with inference

Model Evaluation7.4%, 4.3% median errors for performance, powerS+A, S+R more accurate for performance, power

Future DirectionsDemonstrate for comprehensive design studies 6

Expand design space and benchmark suiteExtend to CMP’s and interconnect modeling

6Lee [HPCA’07] :: www.deas.harvard.edu/∼bcleeBenjamin C. Lee, David M. Brooks 31 :: ASPLOS-XII

Page 32: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Appendixwww.deas.harvard.edu/∼bclee

B.C. Lee and D.M. Brooks.Illustrative design space studies with microarchitectural regression modelsHPCA-13: International Symposium on High Performance Computer Architecture, Feb 2007.

B.C. Lee and D.M. Brooks.Accurate, efficient regression modeling for microarchitectural performance, power prediction.ASPLOS-XII: International Conference on Architectural Support for Programming Languagesand Operating Systems, Oct 2006.

B.C. Lee and D.M. Brooks.Statistically rigorous regression modeling for the microprocessor design space.MoBS-2: Workshop on Modeling, Benchmarking, and Simulation, June 2006.

B.C. Lee and D.M. Brooks.Regression modeling strategies for microarchitectural performance and power prediction.Harvard University Technical Report TR-08-06, March 2006.

Benjamin C. Lee, David M. Brooks 32 :: ASPLOS-XII

Page 33: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

References IY. Li, B.C. Lee, D. Brooks, Z. Hu, K. Skadron.CMP design space exploration subject to physical constraints.HPCA-12: International Symposium on High-Performance Computer Architecture, Feb 2006.

L. Eeckhout, S. Nussbaum, J. Smith, and K. DeBosschere.Statistical simulation: Adding efficiency to the computer designer’s toolbox.IEEE Micro, Sept/Oct 2003.

R. Liu and K. Asanovic.Accelerating architectural exploration using canonical instruction segments.In International Symposium on Performance Analysis of Systems and Software, Austin, Texas,March 2006.

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder.Automatically characterizing large scale program behavior.ASPLOX-X: Architectural Support for Programming Languages and Operating Systems,October 2002.

B.C. Lee and D.M. Brooks.Effects of pipeline complexity on SMT/CMP power-performance efficiency.ISCA-32: Workshop on Complexity Effective Design, June 2005.

Benjamin C. Lee, David M. Brooks 33 :: ASPLOS-XII

Page 34: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

References IIC. Stone.Comment: Generalized additive models.Statistical Science, 1986.

F. Harrell.Regression modeling strategies.Springer, New York, NY, 2001.

J. Yi, D. Lilja, and D. Hawkins.Improving computer architecture simulation methodology by adding statistical rigor.IEEE Computer, Nov 2005.

P. Joseph, K. Vaswani, and M. J. Thazhuthaveetil.Construction and use of linear regression models for processor performance analysis.In Proceedings of the 12th Symposium on High Performance Computer Architecture, Austin,Texas, February 2006.

S. Nussbaum and J. Smith.Modeling superscalar processors via statistical simulation.In PACT2001: International Conference on Parallel Architectures and Compilation Techniques,Barcelona, Sept 2001.

Benjamin C. Lee, David M. Brooks 34 :: ASPLOS-XII

Page 35: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Controlling Simulation Costs

Hybrid SimulationDecouples simulation of microprocessor structuresLeverages fast, specialized simulators for particular units 7

Trace Sampling/CompressionReduces redundant simulationSimulate unique, representative instruction segments 8

Synthetic WorkloadsReduces size of simulator inputsProfiles workload to construct smaller, synthetic traces 9

7Li, Lee, Brooks, Hu, Skadron [HPCA’06]8Liu, Asanovic [ISPASS’06], Sherwood, et al., [ASPLOS’02]9Eeckhout, Nussbaum, Smith, DeBosschre [IEEE Micro’03]

Benjamin C. Lee, David M. Brooks 35 :: ASPLOS-XII

Page 36: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Variable Clustering

stal

l_ca

stst

all_

resv

dl1m

iss_

rate

base

_bip

sdl

2mis

s_ra

test

all_

dmis

sqst

all_

stor

eqbr

_mis

_rat

est

all_

reor

derq

fix_l

atct

l_la

tls

_lat

fpu_

lat

dept

hm

em_l

atw

idth

bips

phys

_reg

l2ca

che_

size

resv

il1m

iss_

rate

il2m

iss_

rate stal

l_in

fligh

tst

all_

rena

me

br_r

ate

br_s

tall

icac

he_s

ize

dcac

he_s

ize

1.0

0.8

0.6

0.4

0.2

0.0

Spe

arm

an ρ

2

Benjamin C. Lee, David M. Brooks 36 :: ASPLOS-XII

Page 37: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Performance Associations

bips

0.6 0.7 0.8 0.9

12131170 815 802

131813201362

11971179 802 822

11251247 818 810

4000

N

4 8

16

[ 9,18) [18,27) [27,33) [33,36]

[ 40, 70) [ 70,100) [100,120) [120,130]

[ 6, 9) [ 9,12)

[12,14) [14,15]

depth

width

phys_reg

resv

Overall

Pipeline

N=4000bips

0.74 0.78 0.82

799 828 807 805 761

774 794 773 837 822

825 776 823 780 796

4000

N

11 12 13 14 15

7 8 9

10 11

6 7 8 9

10

l2cache_size

icache_size

dcache_size

Overall

Memory Hierarchy

N=4000bips

0.5 0.8

1110 9191071 900

1103 9111103 883

1856 349 928 867

4000

N

0.02

[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]

[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]

[0.00,0.02)

[0.04,0.11) [0.11,0.24]

il1miss_rate

dl1miss_rate

dl2miss_rate

Overall

Cache

N=4000

Benjamin C. Lee, David M. Brooks 37 :: ASPLOS-XII

Page 38: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Assessing FitMultiple Correlation Statistic

R2 is fraction of response variance captured by predictorsLarge R2 suggests better fit to observed dataR2 → 1 suggests over-fitting (less likely if p < n/20)

R2 = 1−∑n

i=1(yi − yi)2∑ni=1(yi − 1

n

∑ni=1 yi)2

Residual Distribution AssumptionsResiduals are normally distributed, ei ∼ N(0, σ2)No correlation between residuals and response, predictorsValidate by scatterplots and quantile-quantile plots

ei = yi − β0 −p∑

j=0

βjxij

Benjamin C. Lee, David M. Brooks 38 :: ASPLOS-XII

Page 39: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Predictor Non-Linearity I

Polynomial TransformationsUndesirable peaks and valleysDiffering trends across regions

Linear SplinesPiecewise linear regions separated by knotsInadequate for complex, highly curved relationships

Restricted Cubic SplinesHigher order polynomials provide better fitsContinuous at knotsLinear constraint on tails

Benjamin C. Lee, David M. Brooks 39 :: ASPLOS-XII

Page 40: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Predictor Non-Linearity II

Location of KnotsLocation of knots less important than number of knotsPlace knots at fixed predictor quantiles

Number of KnotsFlexibility, risk of over-fitting increases with knot count5 knots or fewer are often sufficient 10

4 knots is a good compromise between flexibility, over-fittingFewer knots required for small data sets

10Stone [SS’86]Benjamin C. Lee, David M. Brooks 40 :: ASPLOS-XII

Page 41: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Significance Testing I

ApproachGiven two nested models, hypothesis H0 states additionalpredictors in larger model have no response associationTest H0 with F-statistics and p-values

ExamplePredictor interaction requires comparing nested modelsConsider a model y = β0 + β1x1 + β2x2 + β3x1x2.Test significance of x1 with null hypothesis H0 : β1 = β3 = 0

Benjamin C. Lee, David M. Brooks 41 :: ASPLOS-XII

Page 42: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Significance Testing IIF-Statistic

Compare two nested models using their R2 and F-statisticR2 is fraction of response variance captured by predictors

R2 = 1−∑n

i=1(yi − yi)2∑ni=1(yi − 1

n

∑ni=1 yi)2

F-statistic of two nested models follows F distribution

Fk,n−p−1 =R2 − R2

∗k

× n− p− 11− R2

P-ValuesProbability F-statistic greater than or equal to observedvalue would occur under H0

Small p-values cast doubt on H0

Benjamin C. Lee, David M. Brooks 42 :: ASPLOS-XII

Page 43: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Treatment of Missing Data

Missing Completely at Random (MCAR)Treat unobserved design points as missing dataSampling UAR ensures observations are MCARData is missing for reasons unrelated to characteristics orresponses of the configuration

Informative MissingData is more likely missing if their responses aresystematically higher or lower“Missingness” is non-ignorable and must also be modeledSampling UAR avoids such modeling complications

Benjamin C. Lee, David M. Brooks 43 :: ASPLOS-XII

Page 44: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Performance Associations I

bips

0.6 0.7 0.8 0.9

12131170 815 802

131813201362

11971179 802 822

11251247 818 810

4000

N

4 8

16

[ 9,18) [18,27) [27,33) [33,36]

[ 40, 70) [ 70,100) [100,120) [120,130]

[ 6, 9) [ 9,12)

[12,14) [14,15]

depth

width

phys_reg

resv

Overall

Pipeline

N=4000bips

0.70 0.80 0.90

10101017 977 996

1505 728 953 814

2399 800 432 180 189

1036 973

1197 794

1298 707

1046 949

4000

N

4

1 2 3 4 5

1 2

5

[ 34, 56) [ 56, 77)

[ 77,124) [124,307]

[1, 4)

[5, 8) [8,19]

[3, 5) [5,13]

[ 2, 5)

[ 6,10) [10,24]

mem_lat

ls_lat

ctl_lat

fix_lat

fpu_lat

Overall

Latency

N=4000bips

0.74 0.78 0.82

799 828 807 805 761

774 794 773 837 822

825 776 823 780 796

4000

N

11 12 13 14 15

7 8 9

10 11

6 7 8 9

10

l2cache_size

icache_size

dcache_size

Overall

Memory Hierarchy

N=4000

Benjamin C. Lee, David M. Brooks 44 :: ASPLOS-XII

Page 45: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Performance Associations II

bips

0.6 0.8

1076

1060

934

930

1085

1055

960

900

4000

N

[0.01,0.05)

[0.05,0.10)

[0.10,0.16)

[0.16,0.29]

[0.005,0.034)

[0.034,0.072)

[0.072,0.093)

[0.093,0.165]

br_rate

br_mis_rate

Overall

Branch

N=4000bips

0.6 0.8

10751104 910 911

1114 933

1053 900

1135 876

1129 860

4000

N

[ 6225, 108733) [108733, 242619) [242619, 597291) [597291,2821539]

[ 193084,2.60e+06) [ 2599382,7.22e+06) [ 7222608,4.69e+07) [46879987,1.04e+08]

[ 949, 6037) [ 6037, 18667)

[ 18667, 254857) [254857,14883942]

stall_inflight

stall_dmissq

stall_cast

Overall

Stalls

N=4000bips

0.5 0.7 0.9

1093 933

1045 929

1096 907

1077 920

1127 922

1070 881

10991075 941 885

4000

N

[ 0, 9959) [ 9959, 97381)

[ 97381, 299112) [299112,1373099]

[ 0, 350) [ 350, 24448)

[ 24448,144305) [144305,923543]

[ 1814764,5.98e+06) [ 5984258,1.01e+07)

[10134921,2.42e+07) [24224471,1.13e+08]

[ 16188, 2013914) [ 2013914, 4131316)

[ 4131316,22315283) [22315283,45296951]

stall_storeq

stall_reorderq

stall_resv

stall_rename

Overall

Stalls

N=4000

Benjamin C. Lee, David M. Brooks 45 :: ASPLOS-XII

Page 46: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Performance Associations III

bips

0.5 0.8

1110 9191071 900

1103 9111103 883

1856 349 928 867

4000

N

0.02

[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]

[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]

[0.00,0.02)

[0.04,0.11) [0.11,0.24]

il1miss_rate

dl1miss_rate

dl2miss_rate

Overall

Cache

N=4000bips

0.5 0.7 0.9

1056

1122

913

909

4000

N

[0.309,0.92)

[0.920,1.21)

[1.208,1.35)

[1.347,1.58]

base_bips

Overall

Baseline Performance

N=4000

Benjamin C. Lee, David M. Brooks 46 :: ASPLOS-XII

Page 47: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Significance Tests

Microarchitectural PredictorsMajority of F-tests imply significance (p-values < 2.2E − 16)Several predictors were less significant

Control latency (p-value = 0.1247)Reservation station size (p-value = 0.1239)L1 instruction cache size (p-value = 0.02941)

Application-Specific PredictorsMajority of F-tests imply significance (p-values < 2.2E − 16)Pipeline stalls classified by structure are less significant

Completion and reorder queue stalls (p-values > 0.4)

Benjamin C. Lee, David M. Brooks 47 :: ASPLOS-XII

Page 48: Accurate & Efficient Regression Modeling for ...people.duke.edu/~bcl15/documents/lee2006-asplos-slides.pdf · Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII. Motivation & Background

AppendixAppendixReferencesExtra Slides

Related Work

Statistical Significance RankingYi :: Plackett-Burman, effect rankingsJoseph :: Stepwise regression, coefficient rankingsBound parameter values to improve tractabilityRequire simulation for estimation

Synthetic WorkloadsEeckhout :: Profile workloads to obtain synthetic tracesNussbaum :: Superscalar and SMP simulationObtain distribution of instructions and data dependenciesRequire simulation with smaller traces for estimation

Benjamin C. Lee, David M. Brooks 48 :: ASPLOS-XII