1 should we dump flop/s? david h bailey lawrence berkeley national laboratory, usa this talk is...

6
1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: http://crd.lbl.gov/~dhbailey/dhbtalks/flops.pdf

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

1

Should We Dump Flop/s?

David H BaileyLawrence Berkeley National Laboratory, USA

This talk is available at:http://crd.lbl.gov/~dhbailey/dhbtalks/flops.pdf

Page 2: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

2

Using Flop/s As A Metric for Performance

Advantages: Its usage is traditional and well-understood in the HPC community --

data is available for several decades of progress. The flop count for a given algorithm or application is fairly well defined,

although care has to be taken to avoid abuse -- i.e., we should base the flop count on the best practical serial algorithm.

Disadvantages: A focus on flop/s at the expense of other system parameters can lead to

system designs that are poorly balanced for real workloads. Using measured flop count (i.e. by a hardware performance monitor)

may lead to perverse outcomes, such as inefficient algorithms that exhibit artificially high flop/s rates.

Page 3: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

3

Page 4: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

4

Using Mop/s as a Performance Metric

Advantages: A focus on memory operations per second in comparing systems may

result in systems better suitedfor many real-world scientific computation.

Disadvantages: There is NO objective system-independent way to assess the mop count

for a given algorithm or architecture. A focus on mop/s at the expense of other system parameters can lead to

system designs that are poorly balanced for real workloads. Using measured memory operation counts (i.e. by a hardware

performance monitor) may lead to perverse outcomes, such as grossly cache-inefficient algorithms that exhibit artificially high mop/s rates.

Page 5: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

5

How Do We Define Mop Count for a Given Application?

The mop count is inextricably tried to the architecture. Mop count can vary by a factor of 100 depending on how much cache is

available. Unit stride, constant-stride and random stride data are handled very

differently from system to system. Naive schemes to count mops for a given algorithm or implementation (ie

number of flops performed x 3) reduce to using an inflated flop count as the metric.

One possibility: Using Erich Strohmaier’s APEX-map as the basis for the mop count -- it measures the distribution of the distance of one memory operation to the next.

But using APEX-map to perform these measurements is very expensive, and the resulting figure is highly one-dimensional.

Page 6: 1 Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: dhbailey/dhbtalks/flops.pdf

6

Bottom Line: Don’t Dump Flop/s

There is NO intrinsic memory operation count for a given algorithm or architecture.

Mop/s, if anything, has significantly more potential for abuse than flop/s. Perhaps in the future someone can devise an architecture-independent

metric to assess the “work done” in a large scientific application. Until then, flop/s is the best we have.