can we continue to build supercomputers out of ...sandia is a multiprogram laboratory operated by...

71
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energyʼs National Nuclear Security Administration under contract DE-AC04-94AL85000. Richard C. Murphy Sandia National Laboratories DOE Institute for Advanced Architecture [email protected] March 11, 2009 Can We Continue to Build Supercomputers Out of Processors Optimized for Laptops?

Upload: others

Post on 17-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energyʼs National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Richard C. MurphySandia National Laboratories

DOE Institute for Advanced [email protected]

March 11, 2009

Can We Continue to Build Supercomputers Out of Processors Optimized for Laptops?

Page 2: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

The Memory Wall• Historically:

– Processor cycle time decreased FASTER than memory access access latency

• Technologically, this may not hold as cycle times go flat

• Definitely holds as memory hierarchies become more complex– Multicore exacerbates this

problem• Enhancements to “compute” not as effective as decreasing memory latency for problems that don’t fit in cache!

0

0.5

1

1.5

2

2.5

Low Latency

Memory Enhancements

Half LatencyFU

DFS IPC

Processor Enhancements

BranchBase

IPC

Without PrefetchingWith Prefetching

Page 3: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Motivating Example

It is singular how soon we lose the impression of what ceases to be constantly before us.

- Lord Byron

Page 4: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Consider THE Textbook 5-Stage Pipeline

EX MEMIF REG WB

• The Stages Are:– IF: Instruction Fetch– REG: Register File Read– EX: Execute– MEM: Memory Access– WB: Write Back (to register

file)• Real pipelines are more complex– We will assume each stage

takes one clock cycle

Page 5: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Add An L1 Cache

EX MEMIF REG WB

L1Inst.

L1Data

Page 6: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

An L2 Cache

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Page 7: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

And Main Memory

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main MemoryNow we have a computer (circa 1995)

This is looks a lot like the basic building block for today’s machines, but they have

- Many of these- Deeper pipelines- More complex memory hierarchies

Page 8: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

But latency’s complicated...

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory O(100ns) - 200 clocks at 2GHz

O(10ns) - 20 clocks

O(2ns) - 4 clocks

O(<<1ns) - 1 clock

Page 9: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Example Code - Sparse Matrix Vector Multipllyvoid matvec(int nnz, int n, double A_values[], int A_indices[], int A_offsets[], double x[], double y[]) {// Computes y = A*x: A is a sparse square matrix of dimension n w/ nnz nonzeros,// x is a vector of known values and y (on exit) contains the result of A*x.// nnz - Number of nonzero entries in sparse matrix A.// n – Dimension of A, x and y.// A_values – Nonzero matrix values stored contiguously row by row.// A_indices – Column indices with matrix entries stored in A_values.// A_offsets – Offsets of each row into A_values and A_indices.  // x – Input vector// y – Output vector.

 int jstart = 0; int jstop = A_offset[0];  for (int i=0; i<n; ++i) {    jstart = jstop;    jstop = A_offset[i+1];    double sum = 0.0;    for (int j=jstart; j<jstop; ++j)         sum+= A_values[j]*x[A_indices[j]];    y[i] = sum;    }

 return;}

Thanks to Mike Heroux, SNL for making this a real example

Page 10: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

void matvec(int nnz, int n, double A_values[], int A_indices[], int A_offsets[], double x[], double y[]) {// Computes y = A*x: A is a sparse square matrix of dimension n w/ nnz nonzeros,// x is a vector of known values and y (on exit) contains the result of A*x.// nnz - Number of nonzero entries in sparse matrix A.// n – Dimension of A, x and y.// A_values – Nonzero matrix values stored contiguously row by row.// A_indices – Column indices with matrix entries stored in A_values.// A_offsets – Offsets of each row into A_values and A_indices.  // x – Input vector// y – Output vector.

 int jstart = 0; int jstop = A_offset[0];  for (int i=0; i<n; ++i) {    jstart = jstop;    jstop = A_offset[i+1];    double sum = 0.0;    for (int j=jstart; j<jstop; ++j)         sum+= A_values[j]*x[A_indices[j]];    y[i] = sum;    }

 return;}

Thanks to Mike Heroux, SNL for making this a real example

Consider just the inner loop

Example Code - Sparse Matrix Vector Multiplly

Page 11: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

• Assume 32-bit address space• Registers

– Base Address of A_values in $A_values

– Base Address of A_indices in $A_indices

– Base Address of x in $x• Basic Values

– sum in $sum– x in $x

• Temporary values labeled $t0...$tn• Temporary Addresses labeled $a0...$an

Pseudo-Assemblysum += A_values[j]*x[A_indices[j]];

; compute j*4 for pointer mathslli $j_tmp, $j, 4 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 4 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumadd $sum, $sum, $t3

Page 12: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

• Real Floating Point Apps don’t do many FLOPS– 2 FLOPs (mulf, addf)– 5 integer ops to compute addresses

(slli, add, add, slli, add)– 3 memory instructions (loads)

• More if you have to write back sum before the end of the loop

• Real apps at Sandia– Usually < 10% FP– 40-50% mem – 30-40% integer– 10% branch

Observations; compute j*4 for pointer mathslli $j_tmp, $j, 4 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 4 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Sandia FP SPEC FP Sandia Int SPEC INT0

10

20

30

40

50

60

70

80

90

100Mean Instruction Mix

Perc

ent

Integer ALU FP Branch Load Store

Page 13: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Observations (continued)• Arun Rodrigues points out that we’ve talked a lot about floating point “resilience”, and the potential ability to live with less accuracy, but your state bits are equally likely to be in error:– Inaccurate integer address calculations would be bad (40% of

instructions)– Inaccurate branching would be worse (10% of instructions)

• if(my_fp_calculation_was_in_error)...– I don’t even know what an inaccurate load or store would be (40%

of instructions), but it would be bad– Inaccurate instructions could really cause havoc (100% of

instructions)

• As a computer architect, the idea of giving the wrong answer to improve performance makes me feel dirty

Page 14: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

• If the stride one accesses are large enough, they likely miss both caches to memory– A_values[j]– A_indices[j]

• X[A_indices[j]] introduces a data-dependent memory reference– A_indices[j] is required before the

X value can be loaded– Exhibits some temporal and

spatial locality

Observations (continued); compute j*4 for pointer mathslli $j_tmp, $j, 4 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 4 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 15: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 1

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 16: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 2

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 17: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 3

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 18: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 4

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 19: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 5

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 20: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 6

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load $t0, 0($a0) will MISS to memory!

Page 21: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 7

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (1)

Page 22: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 8

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (2)

load $t1, 0($a1) will MISS to memory!

Page 23: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 9

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

slli depends on A_indices[j]

load (3)load (1) /slli pending

Page 24: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 9

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

add depends on A_indices[j]

load (4)load (2) /slli / add pending

Page 25: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 10

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load depends on A_indices[j]

load (3) /slli / add / load pending

load (1)

Page 26: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 11

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

multf depends on both loads!

load (4) /slli / add / load pending

Next Iteration(with perfect branch prediction)

load (2)

Page 27: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 12

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

addf depends on the multf

load (3)

load (1) /slli / add / load / multf / addf pending

Next Iteration(with perfect branch prediction)

Page 28: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 13

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

addf depends on the multf

load (4)

load (2) /slli / add / load / multf / addf pending

Next Iteration(with perfect branch prediction)

Page 29: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 29

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (1)

load (18) /slli / add / load / multf / addf pending

Finally, the first load misses to memory

Page 30: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 30

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (2)

load (19) /slli / add / load / multf / addf pending

Finally, the first load misses to memory

Page 31: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 31

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (3)load (1) /slli / add / load / multf / addf pending

Finally, the first load misses to memory

Page 32: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 229

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (197) /slli / add / load / multf / addf pending

The first load returns to finish executing

Page 33: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 230

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (198) /slli / add / load / multf / addf pending

The first load returns to finish executing

Page 34: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 231

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (199) /slli / add / load / multf / addf pending

The first load returns to finish executing

Page 35: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 232

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

addloadmultfaddf pending

The second load completes

Page 36: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 233

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

loadmultfaddf pending

Finally, the first load is retired!

Page 37: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 234

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

multfaddf pending

The second load retires...

Page 38: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 235

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

multfaddf pending

Now we issue the data dependent load

Page 39: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 236

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (1)multfaddf pending

And wait...

Page 40: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 239

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (1)multfaddf pending

And wait... and wait...

Page 41: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 258

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

load (1)multfaddf pending

And wait... and wait... and wait...

Page 42: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 457

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

multfaddf pending

Page 43: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 458

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

addf pending

The multif is ~4 cycles

Page 44: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 461

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

The addf is ~2 cycles

Page 45: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 462

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

The addf is ~2 cycles

Page 46: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 463

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 47: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 464

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

Page 48: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 465

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Main Memory; compute j*4 for pointer mathslli $j_tmp, $j, 2 ; j*4

; $t0 <= A_values[j]add $a0, $A_values, $j_tmpload $t0, 0($a0)

; $t1 <= A_indices[j]add $a1, $A_indices, $j_tmpload $t1, 0($a1)slli $t1, $t1, 2 ; $t1*4

; t2 <= x[$t1]add $a2, $x, $t1load $t2, 0($a2)

; t3 <= $t0 * $t2mulf $t3, $t0, $t2

; add to sumaddf $sum, $sum, $t3

DONE!

Page 49: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Observations• Completing 10 instructions took 465 cycles!

– Nearly 1/4 microseconds of real time at 2 GHz• The FP multiplication only took 2ns• Yes, I can un-role the loop and pipeline multiple iterations

– However, only so many loads can be outstanding at a time• I haven’t talked about multicore/cache coherency yet

Page 50: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

What about a cache coherent model?

Main Memory

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Page 51: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

What about a cache coherent model?

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

load $t2, 0($a2)

Core 1’s MemoryCore 0’s Memory

0($a2)’s home node 0($a2) being modified here

25 ns

Page 52: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Cycle 5

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

load $t2, 0($a2)

Page 53: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

load $t2, 0($a2)

Cycle 6

Page 54: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

0($a2) is owned by Core 0 but ... Core 1 is modifying it

load $t2, 0($a2)

Cycle 25

Page 55: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

0($a2) is owned by Core 0 but ... Core 1 is modifying it

load $t2, 0($a2)

Cycle 25 (continued)

Send Invalidate Request

Page 56: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

0($a2) is owned by Core 0 but ... Core 1 is modifying it

load $t2, 0($a2)

Cycle 75

20 clock cycle L2 access

Page 57: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

0($a2) is owned by Core 0 but ... Core 1 is modifying it

load $t2, 0($a2)

Cycle 95

Flush the modified line back to the owner

Page 58: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

0($a2) is owned by Core 0 but ... Core 1 is modifying it

load $t2, 0($a2)

Cycle 145

Flush the modified line back to the owner

Page 59: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

load $t2, 0($a2)

Cycle 146

Page 60: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

EX MEMIF REG WB

L1Inst.

Unified L2

L1Data

Core 1’s MemoryCore 0’s Memory

25 ns

load $t2, 0($a2)

Cycle 147

Page 61: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Observations• The “memory” latency in this example is 140 clock cycles

– “Closer” than DRAM– Optimistic because it was serviced by L2 cache

• Latency is “additive”– Each level of hierarchy serves “hits” faster but misses are slower– If I do a “read” owned by another node

• 10ns L2 cache miss• 25ns request over the on-chip network• 10ns L2 cache miss by the owner• 100ns memory read by the owner• 25ns response over the on-chip network• TOTAL: 170ns!

• On-chip network latency will increase as the number of cores increase (not linearly, but...)

Page 62: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

What do I do about it?

• Concurrency:– More cores may make the memory bus busier– Vendors are putting fewer channels/memory controllers per core in

each generation– Decreasing on a per-core basis!

• Latency– Relatively increasing or flat

• How do you fill those cycles?– Traditionally: Out-of-Order Execution– Today: More Threads

ConcurrencyLatency = ThroughputLittle’s Law:

Page 63: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

VLSI Designer’s View of the World

Page 64: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

VLSI Designer’s View of the World

Cores

Page 65: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

VLSI Designer’s View of the World

Cores

Cache

Page 66: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

VLSI Designer’s View of the World

Cores

Cache

Core-to-Core Communication

Page 67: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

VLSI Designer’s View of the World

Cores

Cache

Core-to-Core Communication

Off-chip I/O

Page 68: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Application Impact

.25

.5

1.0

2.0

4.0

.25

.5

1.0

2.0

4.0

0

0.5

1

1.5

Relative Bandwidth

Average Sandia FP Latency and Bandwidth vs. Performance

Relative Latency

IPC

Physics Applications

.25

.5

1.0

2.0

4.0

.25

.5

1.0

2.0

4.0

0

0.5

1

1.5

Relative Bandwidth

Average Sandia Int Latency and Bandwidth vs. Performance

Relative Latency

IPC

Informatics Applications

Informatics Apps ~3X more sensitive to latency than Physics

Page 69: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Application Observations• The commodity path has produced power-inefficient architectures– We observe IPCs of ~.3 at 2.5 GHz– With an IPC of 1, the same system could be clocked at 750 MHz

• Doing so would decrease power requirements superlinearly

• Address generation is a bigger problem than FLOPS• We can decrease latency or increase effective bandwidth

– Many apps throw away 7/8ths of a cache line– Increase effective bandwidth by: Scatter/Gather to improve spatial

locality– Decrease latency by: tighter integration, advanced packaging, etc.

Page 70: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

What I didn’t talk about• Memory capacity/core

– Currently declining, probably not a good thing– Fewer pins/core devoted to memory– Slow, bus interface (point-to-point serial solves the problem)– Inefficient protocols (e.g., FBDIMM)– No virtualization in the memory system– Horrible materials (F4) for boards

Page 71: Can We Continue to Build Supercomputers Out of ...Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of

Conclusions

What we talk about in a procurement What we should talk about

FLOPS Anything but FLOPS:• Memory Latency Dominates• Integer Operations More Frequent• Generally, “data movement”

Memory Bandwidth Effective BandwidthLatencyConcurrency in the memory system

Power Data Movement Efficiency

“Peaks” Sustained