workload characteristics and representative workloads

24
Workload Characteristics and Representative Workloads David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA [email protected]

Upload: tierra

Post on 22-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Workload Characteristics and Representative Workloads. David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA [email protected]. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Workload Characteristics and Representative Workloads

Workload Characteristicsand Representative

WorkloadsDavid Kaeli

Department of Electrical and Computer Engineering

Northeastern UniversityBoston, MA

[email protected]

Page 2: Workload Characteristics and Representative Workloads

Overview

• When we want to collect profiles to be used in the design of a next-generation computing system, we need to be very careful that we capture a representative sample in our profile

• Workload characteristics allow us to better understand the content of the samples which we collect

• We need to select programs to study which represent the class of applications we will eventually run on the target system

Page 3: Workload Characteristics and Representative Workloads

Examples of Workload Characteristics• Instruction mix

• Static vs. dynamic instruction count• Working sets• Control flow behavior• Working set behavior• Inter-reference gap model (temporal

locality)• Database size• Address and value predictability• Application/library/OS breakdown• Heap vs. stack allocation

Page 4: Workload Characteristics and Representative Workloads

Benchmarks• Real or synthetic programs/applications used

to exercise a hardware system, and generally representing a range of behaviors found in a particular class of applications

• Benchmarks classes include:– Toy benchmarks: tower of hanoi, qsort, fibo– Synthetic benchmarks: dhrystone, whetstone– Embedded: EEMBC, UTDSP – CPU benchmarks: SPECint/fp– Internet benchmark: SPECjAppServ, SPECjvm– Commerical benchmark: TPC, SAP, SPECjAppServ– Supercomputing: Perfect Club, Splash, Livermore

Loops

Page 5: Workload Characteristics and Representative Workloads

SPEC Benchmarks Presentation

Page 6: Workload Characteristics and Representative Workloads

Is there a way to reduce the runtime of SPEC while maintaining representativeness?• MinneSPEC (U. of Minn) – Using gprof

statistics about the runtime of SPEC and various Simplescalar simulation results (I-mix, cache misses, etc), we can capture statistically similar, though significantly shorter, runs of the programs

• Provides three input sets will run in:– A few minutes– A few hours– A few days

• IEEE Computer Arch Letters paper

Page 7: Workload Characteristics and Representative Workloads

How many programs do we need?

• Does the set of application capture enough variation to be “representative” of the entire class of workload?

• Should we consider using multiple benchmark suites to factor out similarities in programming styles?

• Can we utilize workload characteristics to identify the particular programs of interest?

Page 8: Workload Characteristics and Representative Workloads

Example of capturing program behavior: “Quantifying Behavioral Differences Between C and C++ Programs” Calder, Grunwald and Zorn, 1995

• C++ is a programming language growing in popularity

• We need to design tomorrow’s computer architectures based on tomorrow’s software paradigms

• How do workload characteristics changes as we move to new programming paradigms?

Page 9: Workload Characteristics and Representative Workloads

Example of capturing program behavior: “Quantifying Behavioral Differences Between C and C++ Programs” Calder, Grunwald and Zorn, 1995

• First problem – Find a set of representative programs from both FO and OO domains– Difficult for OO in 1995

• Differences between programming models– OO relies heavily on messages and methods– Data locality will change due to the bundling together

of data structures in Objects– Size of functions will be reduced in OO Polymorphism

allows for indirect function invocation and runtime method selection

– OO programs will manage a larger number of dynamically allocated objects

Page 10: Workload Characteristics and Representative Workloads

Address and Value Profiling

• Lipasti observed that profiled instructions tend to repeat their behavior

• Many addresses are nearly constant• Many values do not change between

instruction execution• Can we use profiles to better

understand some of these behaviors, and the until this knowledge to optimize execution?

Page 11: Workload Characteristics and Representative Workloads

Address Profiling

• If an address remains unchanged, can we issue loads and store early (similar to prefetching)?

• Do we even have to issue the load or store if we have not modified memory?

• What are the implications if indirect addressing is used?

• Can we detect patterns (i.e., strides) in the address values?

• Can we do anything smart when we detect pointer chasing??

Page 12: Workload Characteristics and Representative Workloads

Data Value Profiling

• When we see that particular data values do not change, how can we take advantage of this?

• Lipasti noticed that a large percentage of store instructions overwrite memory with the value already stored there

• Can we avoid computing new results if we notice that our input operand have not changed?

• What can we do if we a particular operand only takes on a small set of values?

Page 13: Workload Characteristics and Representative Workloads

Parameter Value Profiling

• Profile the parameter values passed to functions

• If these parameters are predictable, we can exploit this fact during compilation

• We can study this on an individual function basis or a call site basis

• Compiler optimizations such as code specialization and function cloning can be used

Page 14: Workload Characteristics and Representative Workloads

Parameter Value Profiling

• We have profiled a set of MS Windows NT 4.0 desktop applications– Word97– Foxpro 6.0– SQLserver 7.0– VC++ 6.0 – Excel97– Powerpoint97– Access97

• We measured the value predictability of parameter values for all non-pointer based parameters

Page 15: Workload Characteristics and Representative Workloads

Parameter Value Profiling

• We look for the predictability of parameters using:– Invariance 1 – probability that the most frequent value is

passed– Invariance 4 – probability that one of the 4 most frequent

values is passed

• Parameter values are more predictable on a call site basis than on a function basis (e.g., for Word97, 8% of the functions pass highly predictable parms, where as when computed on individual call sites, over 16% of the call sites pass highly predictable parms)

• Highly predictable means that on over 90% of the calls the same value is observed

• We will discuss how to clone and specialize procedures when we discuss profile guided data transformations

Page 16: Workload Characteristics and Representative Workloads

How can we reduce the runtime of a single benchmark and still maintain accuracy?• Simpoint – attempt to collect a set of

trace samples that best represents the whole execution of the program– Identifies phase behavior in programs– Considers a metric that captures the

differences between two samples– Computes the difference between these two

intervals– Selects the interval that is closest to all

other intervals

Page 17: Workload Characteristics and Representative Workloads

Simpoint (Calder ASPLOS 2002)• Utilize basic block frequencies to build

basic block vectors (bbf0, bbf1….bbfn-1)• Each frequency is weighted by its length • Entire vector normalized by dividing by

total number of basic blocks executed• Take fixed-length samples (100M instructions)• Compare BBVs using:

– Euclidean Distance: • ED(a, b) = sqrt(sum(i->1,n) (ai-bi)2)

– Manhattan Distance:• MD(a, b) = sum(i->1,n)(|ai-bi|)

Page 18: Workload Characteristics and Representative Workloads

Simpoint (Calder ASPLOS 2002)

• Manhattan Distance is used to build a similarity matrix

• N x N matrix, where N is the number of sampling intervals in the program

• Element SM(x, y) is the Manhattan Distance between two 100M element BBV at sample offsets x and y

• Plot the Similarity Matrix as an upper triangle

Page 19: Workload Characteristics and Representative Workloads

Simpoint (Calder ASPLOS 2002)

• Basic Block Vectors can adequately capture the necessary representative characteristics of a program

• Distance metrics can help to identify the most representative samples

• Cluster analysis (k-means) can improve representativeness by selecting multiple samples

Page 20: Workload Characteristics and Representative Workloads

Simpoint (Calder ISPASS 2004)

• Newer work on Simpoint considers using register def-use behavior on an interval basis

• Also, tracking of loop behavior and procedure calls frequencies provides similar accuracy as using basis block vectors

Page 21: Workload Characteristics and Representative Workloads

Simpoint (Calder ASPLOS 2002)• Algorithm overview

– Profile program by dividing into fixed sized intervals (e.g., 1M, 10M, 100M insts)

– Collect frequency vectors (e.g., BBVs, def-use, etc.) – compute normalized frequencies

– Run k-means clustering algorithm to divide the set of intervals into k partitions/sets, for values of k from 1 to K

– Compute a “goodness-of-fit” of the data for each value of k

– Select the clustering the reduces small k and provides a reasonable “goodness-of-fit” result

– The result is a selection of representative simulation points that best “fit” the entire application execution

Page 22: Workload Characteristics and Representative Workloads

Simpoint Paper

Page 23: Workload Characteristics and Representative Workloads

Improvements to Simpoints (KDD05, ISPASS06)

• Utilize a Mixture of Multinomials instead of K-means

• Assumes data is generated by a mixture of K-component density functions

• We utilize Expectation-Maximization (EM) to find a local maximum likelihood for the parameters of the density function – iterate on E and M steps until convergence

• The number of clusters is selected using the Bayesian Information Criteria (BIC) approach to judge “goodness of fit”

“A multinomial clustering model for fast simulation of computer architecture designs”, K. Sanghai et al., Proc. of KDD 2005, Chicago, IL., pp. 808-813.

Page 24: Workload Characteristics and Representative Workloads

Summary• Before you begin studying a new

architecture, have a clear understanding of the target workloads for this system

• Perform a significant amount of workload characterization before you begin profiling work

• Benchmarks are very useful tools, but must be used properly to obtain meaningful results

• Value profiling is a rich area for future research

• Simpoints can be used to reduce the runtime of simulation and still maintain simulation fidelity