reliability trade-offs against power and performance · reliability trade-offs against power and...

42
Reliability trade-offs against power and performance Nenad Stanković Mentor: Michael Imhof Reliable NoC in the Many Core Era 5/19/2009

Upload: lekiet

Post on 08-May-2018

222 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

Reliability trade-offs against power and performance

Nenad Stanković

Mentor: Michael Imhof

Reliable NoC in the Many Core Era

5/19/2009

Page 2: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

2

Overview

Reliable NoC in the Many Core Era

Introduction to reliability in NoC’s

Reliability issues

Solution to high-reliability design

Conclusion

Page 3: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

3

Overview

Reliable NoC in the Many Core Era

Introduction to reliability in NoC’s

Reliability issues

Solution to high-reliability design

Conclusion

Page 4: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

4

Introduction to Reliability

Technology scaling

System Design

Performance

Area

Power

[Source: www.seed.slb.com]

Page 5: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

5

Introduction to Reliability

Problems that lead to NoC

Systems-on-chip to Networks-on-

chip

Fault tolerance design

High reliability, but…

less power consumption

on a smaller area

with more performance

Page 6: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

6

Overview

Reliable NoC in the Many Core Era

Introduction to reliability in NoC’s

Reliability issues

Solution to high-reliability design

Conclusion

Page 7: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

7

Reliability Issues

Number of problems

increasing with technology

scaling

New problems emerging

No concrete overall solutions

to NoC design available

Errors Faults

ReliabilityPower

AreaPerformance

Architectures

DESIGN

Technology

Page 8: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

8

Main Factors

Wire loads

Core temperature and aging

Power management

Reconfiguration circuits

Environmental effects and

industrial application

Quality of service

Throughput, Latency

Best-effort design

Page 9: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

9

Scaling Problems

Process variability

Transient faults

Crosstalk, EMI

Other noises

Leakage, Interconnect

Supply Voltage errors

[Source: http://www.eecs.berkeley.edu/]

Page 10: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

10

Overview

Reliable NoC in the Many Core Era

Introduction to reliability in NoC’s

Reliability issues

Solution to high-reliability design

Conclusion

Page 11: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

11

Solutions in NoC Designs

Countermeasures:

Avoidance

Detection

Containment

Isolation

Recovery

Domain:

Functional view

Design space

Hardware view

Countermeasures

Avoidance

Detection Containment Isolation

Recovery

Domain

Hardware view

Design Space

Functional view

Page 12: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

12

Countermeasures

Avoidance

Detection

Containment

Isolation

Recovery

Page 13: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

13

Countermeasures

Avoidance

Detection

Containment

Isolation

Recovery

D D

D

D

D

D D D

D

Page 14: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

14

Countermeasures

Avoidance

Detection

Containment

Isolation

Recovery

Page 15: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

15

Countermeasures

Avoidance

Detection

Containment

Isolation

Recovery

Page 16: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

16

Countermeasures

Avoidance

Detection

Containment

Isolation

Recovery

Page 17: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

17

Solutions in NoC Designs

Domain:

Functional view

Design space

Hardware view

Page 18: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

18

Domain: Functional View

Retransmission schemes

E2E, HBH, FEC, HE2E, HFEC

Buffers

Latency and Power

Coding

Hamming code

Single bit errors

Small overhead in area and power

Message, Router IDs

Checker circuits

Page 19: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

19

Solutions in NoC Designs

Domain:

Functional view

Design space

Hardware view

Page 20: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

20

Domain: Design Space

Estimation

Topology, traffic, communication

methods

Analytical models vs. simulation

Power vs. performance

Software

Tools, Frameworks

Simulation and synthesis

Page 21: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

21

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

[Source: Ge Fen, Wu Ning, Wang Qi “Simulation and Performance Evaluation for Network on Chip design Using OPNET” ]

Page 22: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

22

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

[Source: Ge Fen, Wu Ning, Wang Qi “Simulation and Performance Evaluation for Network on Chip design Using OPNET” ]

Page 23: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

23

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

[Source: Ge Fen, Wu Ning, Wang Qi “Simulation and Performance Evaluation for Network on Chip design Using OPNET” ]

Page 24: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

24

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

Blocking time

Message transfer

time

Total Latency

Average Blocking Length

Average distance in Hops

Average Waiting

time

Switch arbiter conflict

probability Injection rate

Virtual channel conflict

probability

Page 25: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

25

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

UtilizationPower Profile

Power Estimation

Average Buffer Power Consumption

Average Routing Computation

Power Consumtion

Average Crossbar Power

Consumption

Average Link Power

Consumption

Page 26: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

26

Estimation Problems

Topology, traffic,

communication methods

Analytical models vs.

simulation

Power vs. performance

[Source: Jongman Kim, Dongkook Park, Chrysostomos Nicopoulos, N. Vijaykrishnan, C. R. Das, “Design and Analysis of an NoC Architecture from Performance, Reliability and Energy Perspective” ]

Page 27: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

27

Software

Tools, Frameworks

Configurability

Mapping

Simulation and Synthesis

PIRATE,SMAP,OPNET

Design space exploration

Power and Performance

assessment

[Source: Gianluca Palermo, Christina Silvano, “PIRATE: A Framework for Power/Performance Exploration of Networks-On-Chip Architectures” ]

Page 28: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

28

Solutions in NoC Designs

Domain:

Functional view

Design space

Hardware view

Page 29: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

29

Domain: Hardware View

Multiple problems to consider

System level

Fault resistance

Variability tolerant

Page 30: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

30

System Level

Topology, Mapping

Pipelines and busses

Wireless communication

Power management

System, Component

Links

Estimator circuits

Stage 1

Error checking circuit

Stage 2

Error checking circuit

Stage 3

Error checking circuit

Stage 4

Error checking circuit

LINK

PE1

PE3

PE2

PE4

Page 31: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

31

System Level

Topology, Mapping

Pipelines and busses

Wireless communication

Power management

System, Component

Links

Estimator circuits

PE3 PE4

PE1 PE2

Control Policy

Estimator

Power Manager

Router

Core

PE

Page 32: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

32

Domain: Hardware View

System level

Fault resistance

Variability tolerant

Page 33: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

33

Fault Resistance Design

Memory

Error detection and

correction

Circuit level

Memory

Routing unit

Switch Arbiter

Virtual Channel

Handshaking

Routing Algorithm

Hamming Code

Retrans. Buffers, IDs

Voting system

Components

Page 34: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

34

Fault Resistance Design

Memory

Error detection and

correction

Circuit level

Main FF

Delayed FF

MUX

Error Control Circuit

XOR

INPUT OUTPUT

CLK

CLK_D

SEL

Page 35: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

35

Fault Resistance Design

Memory

Error detection and

correction

Circuit level

Wire length

Supply Voltage

[Source: Atul Maheshwari, Wayne Burleson, Russell Tessier, “Trading Off Transient Fault Tolerance and Power Consumption in Deep Submicron (DSM) VLSI Circuits” ]

Page 36: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

36

Fault Resistance Design

Memory

Error detection and

correction

Circuit level

Transistor sizes

Threshold voltage

[Source: Atul Maheshwari, Wayne Burleson, Russell Tessier, “Trading Off Transient Fault Tolerance and Power Consumption in Deep Submicron (DSM) VLSI Circuits” ]

Page 37: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

37

Domain: Hardware View

System level

Fault resistance

Variability tolerant

Page 38: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

38

Variation Tolerant Design

Voltage swing and clock

skewing

Self calibrating and

reconfigurable circuits

PE1 PE2

PE3 PE4

Voltage Swing

Stage 1

Stage 2

Stage 3

Stage 4

LINK

Clock Skewing

Page 39: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

39

Variation Tolerant Design

Voltage swing and clock

skewing

Self calibrating and

reconfigurable circuits

Adaptive V. Swing

MAIN FFMUX

Error Checker and

Configurability Circuit

INPUT OUTPUT

CLK

SELFF1

FF2

FF3

16%

32%

48%

SEL

Page 40: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

40

Overview

Introduction to reliability in NoC’s

Reliability issues

Solution to high-reliability design

Conclusion

Page 41: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

41

Conclusion

Reliable NoC in the Many Core Era

Reliability

Power

Performance

Area

Fault tolerance

Technology

Errors, faults

Solutions:

Various Possibilities

Levels

Design Space Exploration

Trade-offs

Stage 1

Error checking circuit

Stage 2

Error checking circuit

Stage 3

Error checking circuit

Stage 4

Error checking circuit

LINK

Page 42: Reliability trade-offs against power and performance · Reliability trade-offs against power and performance Nenad Stankovi ... Evaluation for Network on Chip design Using OPNET”]

Thank you for your attention!

Reliable NoC in the Many Core Era

5/19/2009