a mathematical model for balancing co-phase effects in simulated multithreaded systems joshua l....
Post on 21-Dec-2015
215 views
TRANSCRIPT
A Mathematical Model for Balancing Co-Phase Effects in
Simulated Multithreaded Systems
Joshua L. Kihm, Tipp Moseley, and Dan Connors
University of Colorado at Boulder
Exploiting Phase Behavior for Efficient Architecture Simulation• Program behavior patterns, or Phases, can be exploited for
efficient simulation [Simpoint-Sherwood, et al. PACT ’01]– Capture repeating phase and eliminate simulation time or direct
detailed simulation
• Industry trends towards multithreaded processors– In a multithreaded system, execution is characterized by a
combination of phases between co-resident threads, called a Co-Phase [VanBiesbrouk, et al., ISPASS ’04]
– Phase exploitation more difficult for simulation and design of multithreaded systems since the individual phases interact in unique ways
Program Execution
Terminology
Period1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
A CA A A A A AA A A AB B B B B CC C C Phase
1 2 3 44 4 4 433 322 2 211 1 1 1 1 Interval
PERIOD – A segment of program execution of a given length
(one OS scheduling period in this work)
PHASE – A set of periods with similar behavior
INTERVAL – A set of consecutive periods with the same phase(one occurrence of a phase)
Effects of Co-phases
• 181.mcf and 186.crafty
• Relative progress of threads is determined by individual thread behavior and inter-thread interference
• As Co-Phase changes, so does interference and performance
• Data from Pentium-4 (Northwood) system illustrates co-phase effects and transitions
• [Graph format from VanBiesbrouk]
What if we start here?
Or here?
Or here?
Problem Statement
• Variation in offset between threads causes variation in which co-phases are encountered and their relative importance – >15% standard deviation in IPC for some combinations.
• Offset is caused by:– Start Times– OS Scheduling– Simulation Error
• The average performance must be determined in order to reflect real system performance where the relative position of threads will be randomized
Example Analysis of Pentium-4 Data
Total ST runtime
HT performs below ST!
Best performanceat high offset
Motivation (Methodology)
• Tested on implemented hardware– Intel Pentium-4 Northwood with
Hyperthreading
• Used 5 SPEC CPU 2000 benchmarks– 188.ammp, 179.art, 186.crafty, 252.eon,
181.mcf– Long-running benchmarks
• Offsets of –100s,-90s, -80s, … +80s, +90s, +100s (21 tests per pairing)
Performance Variance Due to Offset
• Percent standard deviation• Variation is high for many metrics• Self-pairings have high variation
Co-Phase Variance
• Difference in portion of time spent in each co-phase
• Co-phase mix changes with offset
Conceptual Model
• The time spent in each co-phase interval will determine overall performance
• The amount of time in the co-phase interval is dependent on each thread’s:– Performance in co-phase– Length of the interval– Number of operations already completed in
the current interval of each thread
Determining the Time in Co-Phase Interval
• Interval length and co-phase performance are constant, but need to be determined ahead of time*Assumption of phase-based simulation
• The number of a operations already completed is a function of previous performance and co-phase profile
Determining the Time in a Co-Phase Interval
Offset
Time inInterval
Interval i runs in its entirety
Interval is notencountered
Similar case for thread Y Part of Interval occurs(Monotonic)
Overall case is the minimum
Thread Y changes phase first Next interval is (i,j+1)
Thread X changes phase first Next interval is (i+1,j)
Area under the curve isproportional to averagelength of the interval
Mathematical Model
Thread X finishes first
Thread Y finishes first
Performance in Co-Phase
Number of operations in
interval
Number of operations yet to
complete in interval
Number of operations already completed in
Interval
Start-up Intervals
• Interval lengths are dependent on previous intervals (the total number of retired operations) all the way back to the start of execution of the thread–Some model is needed to simulate the number of operations difference between thread
• Model based on single-threaded behavior*Assume that single phase behavior is indicative of average co-phase behavior
Deriving Start-up Intervals
Offset
Time inInterval
Length of PhaseInterval
Interval is neverentered
Interval doesn’t occur Co-phase i,1 Next Startup interval i+1,0
i+1,0
i-1,0
Interval length equals offset
M=1
Length of phase i
Mathematical Model(Start-up Interval)
Length of intervalsup to “i”
Length of intervalsup to and including “i”
Partial completion
Interval not encountered
Full completion
Example Analysis of Pentium-4 Data
Run time of crafty Run time of art
Total ST runtime
Art Phase 2 causes heavy interference
HT performs below ST!
Best performanceat high offset
Extensions to More Threads
• One thread is “reference”– Arbitrarily chosen– Number of variables
grows linearly
• Concepts and equations easily extend to more threads