1 should we dump flop/s? david h bailey lawrence berkeley national laboratory, usa this talk is...
Post on 20-Dec-2015
216 views
TRANSCRIPT
1
Should We Dump Flop/s?
David H BaileyLawrence Berkeley National Laboratory, USA
This talk is available at:http://crd.lbl.gov/~dhbailey/dhbtalks/flops.pdf
2
Using Flop/s As A Metric for Performance
Advantages: Its usage is traditional and well-understood in the HPC community --
data is available for several decades of progress. The flop count for a given algorithm or application is fairly well defined,
although care has to be taken to avoid abuse -- i.e., we should base the flop count on the best practical serial algorithm.
Disadvantages: A focus on flop/s at the expense of other system parameters can lead to
system designs that are poorly balanced for real workloads. Using measured flop count (i.e. by a hardware performance monitor)
may lead to perverse outcomes, such as inefficient algorithms that exhibit artificially high flop/s rates.
3
4
Using Mop/s as a Performance Metric
Advantages: A focus on memory operations per second in comparing systems may
result in systems better suitedfor many real-world scientific computation.
Disadvantages: There is NO objective system-independent way to assess the mop count
for a given algorithm or architecture. A focus on mop/s at the expense of other system parameters can lead to
system designs that are poorly balanced for real workloads. Using measured memory operation counts (i.e. by a hardware
performance monitor) may lead to perverse outcomes, such as grossly cache-inefficient algorithms that exhibit artificially high mop/s rates.
5
How Do We Define Mop Count for a Given Application?
The mop count is inextricably tried to the architecture. Mop count can vary by a factor of 100 depending on how much cache is
available. Unit stride, constant-stride and random stride data are handled very
differently from system to system. Naive schemes to count mops for a given algorithm or implementation (ie
number of flops performed x 3) reduce to using an inflated flop count as the metric.
One possibility: Using Erich Strohmaier’s APEX-map as the basis for the mop count -- it measures the distribution of the distance of one memory operation to the next.
But using APEX-map to perform these measurements is very expensive, and the resulting figure is highly one-dimensional.
6
Bottom Line: Don’t Dump Flop/s
There is NO intrinsic memory operation count for a given algorithm or architecture.
Mop/s, if anything, has significantly more potential for abuse than flop/s. Perhaps in the future someone can devise an architecture-independent
metric to assess the “work done” in a large scientific application. Until then, flop/s is the best we have.