explicit hw and sw hierarchies high-level abstractions for giving the system what it wants mattan...

22
Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

Upload: mariam-whitus

Post on 30-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

Explicit HW and SW HierarchiesHigh-Level Abstractions for giving the system what it wants

Mattan Erez

The University of Texas at Austin

Salishan 2011

Page 2: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Power and reliability bound performance• More and more components• Per-component improvement too slow

1 KW

10 KW

100 KW

1 MW

10 MW

100 MW

1 GW

Tera Peta Exa

Page 3: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Power and reliability bound performance• More and more components• Per-component improvement too slow

0.1

1

10

100

1000

10000

100000

1000000

0.125 0.5 2 8 32 128 512

MTT

I [H

ours

]

Performance [PFLOPs]

Impact of per-socket FIT rate

500 FIT

2,000 FIT

8,000 FIT

32,000 FIT

Page 4: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

What can we do?

• Compute less and store less– Use better algorithms

• Specialize more– But still innovate on algorithms

• Waste less– Minimize movement– Dynamically rebalance hardware

• Efficient resiliency for reliability– Minimize redundancy– Tradeoff inherent reliability and resiliency

Page 5: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Power is a zero-sum game

• Tradeoff control, compute, storage, comm.

– Dense algebra

– Large sparse data

– Building data structures

ALU/FPU

Registers

Caches

Control

NoC

I/O

Reliability

Other

Page 6: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Hierarchy enables HW/SW co-tuning and co-design• Hierarchy as common abstraction for HW and

SW– Basic engineering– Match abstractions

• Portability to ensure progress– Co-design cycle

• Portability to ensure efficiency– Co-tune for proportionality

Page 7: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNNHardware hierarchy – locality

• Communication and storage dominate energy• Closer and smaller == better

– Amortize cost of global operations

28nm

20mm

64-bit DP26 pJ 256 pJ

1 nJ

500 pJ Efficientoff-chip

link

256-bitbuses

16 nJDRAMRd/Wr

256-bit access8 kB SRAM

50 pJ

20 pJ

Page 8: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Locality hierarchy “minimizes” hardware• Efficiency/performance tradeoffs

– Efficiency goes up as BW goes down

Page 9: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Hardware hierarchy – control

• Specialization is a form of hierarchy– Amortize SW control decisions in HW

• Sophisticated high-level control– Dynamic rebalancing

• Simple low-level control– Minimize hardware waste

• How far can we push this?

Page 10: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNNHierarchical HW hierarchical SW

• Hierarchy is least abstract common denominator

L2 cache

ALUs ALUs

Main memory

L1 cache L1 cache

Dual-core PC

L2 cache

ALUs

Nodememory

Aggregate cluster memory(virtual level)

L1 cache

L2 cache

ALUs

Nodememory

L1 cache

L2 cache

ALUs

Nodememory

L1 cache

L2 cache

ALUs

Nodememory

L1 cache

4 node cluster of PCsCluster of dual Cell blades

LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS

Main memory

Aggregate cluster memory(virtual level)

LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS

Main memory

GPU memory

ALUs

SM

ALUs

SM

ALUs

SM

ALUs

SM

ALUs

SM

ALUs

SM

ALUs

SM

ALUs

SM

System with a GPU

Main memory

ALUs

SM…

ALUs

SM

matmullarge matrix mult

A B C

matmul_L132x32

matrix mult ...

matmul_L2256x256

matrix mult

matmul_L132x32

matrix mult

matmul_L132x32

matrix mult

matmul_L132x32

matrix mult

matmul_L2256x256

matrix mult

matmul_L132x32

matrix mult ...matmul_L1

32x32matrix mult

matmul_L132x32

matrix mult

matmul_L132x32

matrix mult

. . . . . . . . .

Page 11: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNNTask hierarchiestask matmul::inner( in float A[M][T], in float B[T][N], inout float C[M][N] ){ tunable int P, Q, R; mappar( int i=0 to M/P, int j=0 to N/R ) { mapseq( int k=0 to T/Q ) {

matmul( A[P*i:P*(i+1);P][Q*k:Q*(k+1);Q], B[Q*k:Q*(k+1);Q][R*j:R*(j+1);R], C[P*i:P*(i+1);P][R*j:R*(j+1);R] ); } }}

task matmul::leaf( in float A[M][T], in float B[T][N], inout float C[M][N] ){ for (int i=0; i<M; i++) for (int j=0; j<N; j++) for (int k=0; k<T; k++) C[i][j] += A[i][k] * B[k][j];}

matmul::inner

matmul::leaf

Variant call graph

Page 12: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN

A B C

Task hierarchiestask matmul::inner( in float A[M][T], in float B[T][N], inout float C[M][N] ){ tunable int P, Q, R; mappar( int i=0 to M/P, int j=0 to N/R ) { mapseq( int k=0 to T/Q ) {

matmul( A[P*i:P*(i+1);P][Q*k:Q*(k+1);Q], B[Q*k:Q*(k+1);Q][R*j:R*(j+1);R], C[P*i:P*(i+1);P][R*j:R*(j+1);R] ); } }}

task matmul::leaf( in float A[M][T], in float B[T][N], inout float C[M][N] ){ for (int i=0; i<M; i++) for (int j=0; j<N; j++) for (int k=0; k<T; k++) C[i][j] += A[i][k] * B[k][j];}

Callee task: matmul::leaf

Calling task: matmul::inner

A B C

Located at level X

Located at level Y

Page 13: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Hierarchical software enables efficiency• Portability

– Hierarchy is least abstract common denominator – It’s what systems want

• Proportionality– Co-tune hardware and software– Path to true efficiency

• Co-design cycles– Maintain efficiency with new technology

• How strict is the hierarchy?

Page 14: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, NVIDIA

Hierarchical software enables co-tuning• Locality profiles drive dynamic rebalancing

1.0E+0 1.0E+3 1.0E+6 1.0E+9 1.0E+120

20

40

60

80

100

120

Storage Size

% M

iss

Page 15: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Proportional and efficient resiliency

• Resiliency principles:– Detect fault– Correct erroneous data if possible– Contain fault– Repair/reconfigure– Restore state and re-execute

• Each step can be improved with co-tuning– Ignore certain faults (allow some errors)– Detect at coarse granularity– Contain where cheapest– Re-map application instead of repairing/reconfiguring

hardware– Preserve and restore minimally and effectively

Page 16: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Hierarchical resiliency – containment domains

• Containment domains enable proportionality

• Match locality hierarchy with resiliency hierarchy– Efficient state preservation and restoration– Predictable (minimal) overhead

• Hierarchy provides natural domains for managing faults (and rebalancing)

– Co-tune resiliency scheme in HW and SW– Range of hardware error detection and

correction mechanisms– Mechanisms introduce minimal overhead

when not in use

Page 17: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Containment Domains: a full-system approach to resiliency• Hierarchy provides natural domains for containing

faults• Containment domains enable

software-controlled resilience– Preserve data on domain start

– Detect faults before domain commits

– Recover: restore data and re-execute when necessary

• Arbitrary nesting– Tasks

– Functions

– Loop iterations

– Instructions

• Amenable to compiler analysis• Constructs for programmer tuning

Page 18: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Tunable error protection

• High AMTTI requires strong error protection– Global redundancy overhead can be

high

– Hardware mechanisms can help

– Can do even better with software control

• Containment domains enable specialized protection– Each domain can have unique

detection routine• May even be scenario specific

– Redundancy can be added at any granularity

B CA

B CA B CA=?

B CA

B CA

=?

B CA

Page 19: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

State preservation and restoration

• Match storage hierarchy• Utilize NV memory• Explicit software control• Trade off overheads:

– Storage, local and global bandwidth, recomputation, complexity and effort

Page 20: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Faults and default behavior encompasses current approaches• Soft memory errors

– Detect: hardware ECC

– Recover: retry, if fail then restore, re-execute

• Hard memory fault– Detect: runtime liveness– Recover:

• Map-out bad mem

• If enough space then: recover and re-exec

• Else: escalate failure

• Soft arithmetic error– Detect: user-selectable

• Duplicated execution (HW/SW)

• Other HW techniques

• Algorithm-specific assert

– Recover: retry, if fail then restore, re-execute

• Soft control errors– Detect:

• User selectable signatures

• Implicit exceptions

– Recover: restore, re-execute

• Hard compute fault– Detect: runtime liveness– Recover:

• Map-out bad PE

• If OK w/o resource or spare available then: recover and re-exec

• Else: escalate failure

• High-level unhandled faults– Detect: runtime heartbeat– Recover:

• Escalate failure

Page 21: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Containment domains examplevoid task<inner> SpMV( in matrix,

in veci, out resi){

forall(…) reduce(…) SpMV(matrix[…],veci[…],resi[…]);} preserve {preserve_NV(matrix);} //innerrestore_for_child {…}

void task<leaf> SpMV(…) { for r=0..N for c=rowS[r]..rowS[r+1] { contain { resi[r]+=data[c]*veci[cIdx[c]]; } check {fault<fail>(c > prevC);} prevC=c; }}preserve {preserve_NV(matrix);} //leaf

Page 22: Explicit HW and SW Hierarchies High-Level Abstractions for giving the system what it wants Mattan Erez The University of Texas at Austin Salishan 2011

NNN (c) Mattan Erez, UT Austin

Summary

• Hierarchy is basic engineering approach– Works for hardware and works for software

• Hierarchy is inevitable– Minimize movement– Amortize control

• Match explicit hierarchies in HW and SW– Lowest abstract common denominator

• Natural domains and boundaries enable:– Co-design– Co-tuning– Dynamic rebalancing – Resiliency